From: mag <ma...@ni...> - 2003-02-22 02:42:16
|
Hi, I am considering adding kernel side caching to fuse. The idea is that there is no reason to incure the overhead of a bounce to userspace to get some page that was just served to the kernel a second ago. As long as the userspace program has some interface to make sure that the kernel expires the pages when they change, it should be a reasonably safe optimization. The problem is, I was not able to find any documentation on the interface to the page cache in linux 2.4. Most of VFS documentation seems to date to the 2.2 days. AFAIS, a good start would be to figure out what the analogs of dcache_Add() and dcache_lookup() are in 2.4. Do you have any idea where I can find any relevant documentation? -- Thanks in advance, mag |
From: <Mik...@et...> - 2003-02-24 16:03:34
|
Hi! > I am considering adding kernel side caching to fuse. > > The idea is that there is no reason to incure the overhead of a bounce to > userspace to get some page that was just served to the kernel a second ago. > > As long as the userspace program has some interface to make sure that the > kernel expires the pages when they change, it should be a reasonably safe > optimization. Can you ellaborate on what sort of interface you have in mind? > The problem is, I was not able to find any documentation on the interface to > the page cache in linux 2.4. Most of VFS documentation seems to date to the > 2.2 days. AFAIS, a good start would be to figure out what the analogs of > dcache_Add() and dcache_lookup() are in 2.4. > > Do you have any idea where I can find any relevant documentation? Yes the page-cache is an ugly beast. The best way to understand it is probably to look at the source code. Miklos |
From: mag <ma...@ni...> - 2003-02-24 19:25:26
|
On Mon, Feb 24, 2003 at 05:03:13PM +0100, Miklos Szeredi wrote: > Can you ellaborate on what sort of interface you have in mind? Well, the kernel module figures out which request the incoming data corresponds to based on unique, so I was thinking of just a particular value of unique to messages that originate in userspace... From there such messages would contain a jobtype (such as forget) and the inode it corresponds to. Based on this information the kernel module would purge the information related to the inode in question from the cache. Anything obviously wrong with this idea? On a somewhat related note, what does FUSE_FORGET do? I am not really clear on why the kernel module randomly issues them.... > > Do you have any idea where I can find any relevant documentation? > > Yes the page-cache is an ugly beast. The best way to understand it is > probably to look at the source code. Yeah... I made the mistake of first looking at the ext2 driver, and was thoroughly underdocumented and confusing, but some of the other drivers (such as jfs) seem much better written, perhaps I can figure out what to do based on them... -- Live long and prosper, mag |
From: <Mik...@et...> - 2003-02-26 08:54:15
|
Hi, > Well, the kernel module figures out which request the incoming data > corresponds to based on unique, so I was thinking of just a particular value > of unique to messages that originate in userspace... > > >From there such messages would contain a jobtype (such as forget) and the > inode it corresponds to. Based on this information the kernel module would > purge the information related to the inode in question from the cache. > > Anything obviously wrong with this idea? Filesystems are usually implemented in a synchronous way: the kernel requests an operation an the filesystem gives some result. There are possibilities for notifications (like notifying the kernel, when a file/directory has changed) but these are less often used. So I beleive, that this caching mechanism should also be synchronous: whenever there is request for a cached file, the kernel should ask whether file is up-to-date. > On a somewhat related note, what does FUSE_FORGET do? > I am not really clear on why the kernel module randomly issues them.... The VFS keeps a cache of inodes. When an inode is removed from this cache, the kernel notifies the userspace that this inode is no longer in the kernel, so the userspace part can be thrown away too. > In fact as far as I can see, at least file reads in fuse should be cached. > Once readpage fills the page and sets it uptodate, it should go into the > pagecache (generic_file_read takes care of that) and it is my undestanding > (is this correct?) Yes. > that VFS should not initiate another readpage on the same page as > long as that page stayes uptodate and in cache. At least that's what > it seems like should happen from looking at the NFS client > code. Their readpage implementation always uses RPC to fetch the > page from the remote server and then just sets it uptodate, as does > fuse, and their read_file is just a thin wrapper around > generic_file_read, again very similar to fuse. And yet, if you read > the same file twice, it only retrieves its content over the network > the first time, whereas with fuse, no matter how many time I read > the same file over and over again, a readpage is issues every time! This is intentional: in fuse_open() the invalidate_inode_pages() call removes all cached pages from the file. This is needed, since at the moment there is no versioning or other means of telling the kernel whether a file has changed, since last opened. > PS. On a related note, maybe fuse_readpage should not set the page up-to-date > if the actual read size is smaller than PAGE_CACHE_SIZE instead of just > silently padding it with zeros?? I guess it does not matter if the cache is > not working anyway, but it somehow seems like a bad idea.... The idea is, that if the read request returns a short result (less then PAGE_CACHE_SIZE), then it must be at the end of file, and the rest can safely be zeroed. So the userspace is not allowed to return a short read, except at EOF. Miklos |
From: Michael G. <ma...@ni...> - 2003-02-26 22:10:20
|
Hello, On Wed, Feb 26, 2003 at 09:53:45AM +0100, Miklos Szeredi wrote: > Filesystems are usually implemented in a synchronous way: the kernel > requests an operation an the filesystem gives some result. There are > possibilities for notifications (like notifying the kernel, when a > file/directory has changed) but these are less often used. > > So I beleive, that this caching mechanism should also be synchronous: > whenever there is request for a cached file, the kernel should ask > whether file is up-to-date. Yeah... But the problem here is that just having the kernel side validate the cache on every open is not frequent enough, and doing so on every readpage is very slow, so I think that user->kernel notifications are definitly the way to go. Unless you have a better idea that is.... What we could do to give the user maximum flexibility is flush the cache for the file depeding on the return result of open. That way if you want to use notifications, you leave the cache, but if you want to get the current behaviour, where every readpage goes through userspace, you would choose to clear the cache. The easiest way is prolly to have the user->kernel forget message just contain the inode number of the file it relates to. I can get a pointer to the inode structure using the inode_hashtable, but I would need to have a pointer to the superblock of the corresponding filesystem, so the question is, how do I identify which filesystem it belongs to? Is there some sort of a hash table for devices that I can use similarly, or do I have to keep my own hash? And how do I know my device id from user-space? > This is intentional: in fuse_open() the invalidate_inode_pages() call > removes all cached pages from the file. Hehe, yeah. I figured that out last night myself. Took me a while.... I was only looking over the read functions, so I was going pretty crazy because it looked like the pages should have been getting cached, but they were not :) > The idea is, that if the read request returns a short result (less > then PAGE_CACHE_SIZE), then it must be at the end of file, and the > rest can safely be zeroed. > > So the userspace is not allowed to return a short read, except at EOF. That's what I presumed. Normally though reads are allowed to be short, so maybe it should be explicitly documented somewhere. -- Live long and prosper, mag |
From: Michael G. <ma...@ni...> - 2003-02-26 00:59:27
|
Hi, On Mon, Feb 24, 2003 at 05:03:13PM +0100, Miklos Szeredi wrote: > Yes the page-cache is an ugly beast. The best way to understand it is > probably to look at the source code. Hmmm... Interesting. I looked over the source for quite a few other filesystems (including nfs, which is not backed by a block device, and thus cannot be possibly relying on block cache for caching) and I cannot see a whole lot of difference from what fuse is doing. In fact as far as I can see, at least file reads in fuse should be cached. Once readpage fills the page and sets it uptodate, it should go into the pagecache (generic_file_read takes care of that) and it is my undestanding (is this correct?) that VFS should not initiate another readpage on the same page as long as that page stayes uptodate and in cache. At least that's what it seems like should happen from looking at the NFS client code. Their readpage implementation always uses RPC to fetch the page from the remote server and then just sets it uptodate, as does fuse, and their read_file is just a thin wrapper around generic_file_read, again very similar to fuse. And yet, if you read the same file twice, it only retrieves its content over the network the first time, whereas with fuse, no matter how many time I read the same file over and over again, a readpage is issues every time! Any idea what's going on here? Is fuse somehow purposefully dodging the caching to always redirect all requests to userspace? If so, how? PS. On a related note, maybe fuse_readpage should not set the page up-to-date if the actual read size is smaller than PAGE_CACHE_SIZE instead of just silently padding it with zeros?? I guess it does not matter if the cache is not working anyway, but it somehow seems like a bad idea.... -- Live long and prosper, mag |