From: abulford <abu...@gm...> - 2013-09-06 11:13:02
|
Hi all, I'm implementing a FUSE filesystem which talks to a RESTful API to get the files and file information, this is sometimes over a high latency network, file information and the files themselves are cached. Typically I use getattr to trigger a HEAD request to the API if I don't already have the file information cached, a subsequent open will result in a GET request to the API. I'm finding that the getattr calls appear to be coming in sequentially, so if a single getattr call is taking place then no other getattr call will be made. Since a getattr can result in a HEAD request, which can take up to 1 second on the high latency network, this means no other file access can happen while the HEAD request is taking place, which could be problematic for me. I am running in the default multi-threaded mode and am finding that other requests, such as open, do seem able to run in parallel, so I'm unsure why getattr would be forced to be run sequentially. I've already posted this question on stackoverflow, with a code sample and a bit more detail, you can see the question here: http://stackoverflow.com/questions/18471238/should-the-fuse-getattr-operation-always-be-single-threaded I've tried this on two boxes, Kubuntu kernel version 3.8.0 with FUSE version 2.9 as well as CentOS (running on XEN) kernel version 2.6.18 FUSE version 2.7.4, I get the same results on both. I'm quite concerned about this because an issue with the endpoint or network could result in all file access being blocked, even if the files are cached locally. Am I completely missing something, or is this a known/intentional constraint in FUSE? Thanks, Andy -- View this message in context: http://fuse.996288.n3.nabble.com/GetAttr-calls-being-serialised-tp11741.html Sent from the Fuse mailing list archive at Nabble.com. |
From: David S. <da...@da...> - 2013-09-12 20:27:34
|
On Fri, Sep 6, 2013 at 4:12 AM, abulford <abu...@gm...> wrote: > I'm finding that the getattr calls appear to be coming in sequentially, so > if a single getattr call is taking place then no other getattr call will be > made. Since a getattr can result in a HEAD request, which can take up to 1 > second on the high latency network, this means no other file access can > happen while the HEAD request is taking place, which could be problematic > for me. The main reason you'd see a long sequence of getattr calls is generally a follow-up to information returned from readdir. In our file systems, we try to pre-fetch and cache the attribute information (which will inevitably be requested) during readdir so we don't have to hit the backend for each one. -- David Strauss | da...@da... | +1 512 577 5827 [mobile] |
From: abulford <abu...@gm...> - 2013-09-16 17:00:37
|
> The main reason you'd see a long sequence of getattr calls is > generally a follow-up to information returned from readdir. In our > file systems, we try to pre-fetch and cache the attribute information > (which will inevitably be requested) during readdir so we don't have > to hit the backend for each one. I see what you mean - normally a user might list the content of a directory, which would result in a readdir followed by a getattr call for each file in the directory, and there could be lots, generating a string of sequential getattr calls, like those my question asks about. In my situation my test harness is specifically trying to open a large number of known file paths all at the same time (using multiple threads). It's calling fopen on the full path, rather than determining the contents of its parent directory through readdir, so unfortunately I'm unable to make use of readdir to pre-fetch the content as you suggest. Your comment does relate to something I've recently discovered, though - the sequential access is only on a per directory basis. So, for example, if the test harness opens '/mnt/fs/dir1/file1.ext' and '/mnt/fs/dir1/file2.ext' then I would first get the getattr call for '/dir1/file1.ext' and only once it's finished (which is slow, when I intentionally put a delay in the endpoint) do I get the gettr call for '/dir1/file2.ext'. However, if my test harness opens '/mnt/fs/dir1/file1.ext' and '/mnt/fs/dir2/file2.ext' then I get both getattr calls immediately, so they are both executing at the same time. Essentially, if paths are in different parts of the tree they don't seem to affect each other. When looking through the source code before I noticed mention of trees, so I'm going to have a bit of a closer look. I've also found that subsequent calls of getattr to the same path are not forced to run in parallel. Due to the caching in my file system it wasn't immediately obvious, because subsequent getattr calls are always very quick (getting the information from the cache instead of the endpoint), so it wouldn't really matter if they all happened sequentially. However, I've noticed that when running in a mode where information is not cached, multiple getattrs are still able to run in parallel, even within the same directory, if getattr has been called for the path before. I guess this is a result of FUSE's caching, something else I will look in to. -- View this message in context: http://fuse.996288.n3.nabble.com/GetAttr-calls-being-serialised-tp11741p11750.html Sent from the Fuse mailing list archive at Nabble.com. |
From: Miklos S. <mi...@sz...> - 2013-09-20 10:18:44
|
On Mon, Sep 16, 2013 at 7:00 PM, abulford <abu...@gm...> wrote: > >> The main reason you'd see a long sequence of getattr calls is >> generally a follow-up to information returned from readdir. In our >> file systems, we try to pre-fetch and cache the attribute information >> (which will inevitably be requested) during readdir so we don't have >> to hit the backend for each one. > > I see what you mean - normally a user might list the content of a directory, > which would result in a readdir followed by a getattr call for each file in > the directory, and there could be lots, generating a string of sequential > getattr calls, like those my question asks about. In my situation my test > harness is specifically trying to open a large number of known file paths > all at the same time (using multiple threads). It's calling fopen on the > full path, rather than determining the contents of its parent directory > through readdir, so unfortunately I'm unable to make use of readdir to > pre-fetch the content as you suggest. > > Your comment does relate to something I've recently discovered, though - the > sequential access is only on a per directory basis. Lookup (i.e. first finding the file associated with a name) is serialized per directory. This is in the VFS (the common filesystem part in the kernel), so basically any filesystem is susceptible to this issue, not just fuse. And you can still do what David suggested, despite not using readdir: the filesystem code detects that multiple entries are being looked up in a directory, so it triggers an internal readdir request to prime the cache and then subsequent lookups can be served quickly. Thanks, Miklos |
From: Andrew B. <abu...@gm...> - 2013-09-20 11:28:58
|
On Fri, Sep 20, 2013 at 11:18 AM, Miklos Szeredi <mi...@sz...> wrote: > Lookup (i.e. first finding the file associated with a name) is > serialized per directory. This is in the VFS (the common filesystem > part in the kernel), so basically any filesystem is susceptible to > this issue, not just fuse. I understand, thanks for the explanation. > And you can still do what David suggested, despite not using readdir: > the filesystem code detects that multiple entries are being looked up > in a directory, so it triggers an internal readdir request to prime > the cache and then subsequent lookups can be served quickly. I'm afraid I don't understand what you mean by this - I have readdir implemented and just writing to a log to say it's been called, but it's not showing any calls coming through. Is an 'internal readdir' different to the readdir in my FUSE implementation? And if so, how can I hook in to this call? Many thanks, Andrew |
From: Miklos S. <mi...@sz...> - 2013-09-20 11:46:55
|
On Fri, Sep 20, 2013 at 1:28 PM, Andrew Bulford <abu...@gm...> wrote: >> And you can still do what David suggested, despite not using readdir: >> the filesystem code detects that multiple entries are being looked up >> in a directory, so it triggers an internal readdir request to prime >> the cache and then subsequent lookups can be served quickly. > > I'm afraid I don't understand what you mean by this - I have readdir > implemented and just writing to a log to say it's been called, but it's not > showing any calls coming through. Is an 'internal readdir' different to the > readdir in my FUSE implementation? And if so, how can I hook in to this > call? I'm not familiar with the RESTful API. If you can't enumerate the files in a directory then my idea won't going to help. If yoy can enumerate all files (i.e. readdir) then you can effectively prime your cache in your ->getattr() implementation which means no more slow , serialized requests over the net. Thanks, Miklos |
From: Andrew B. <abu...@gm...> - 2013-09-23 08:52:41
|
On Fri, Sep 20, 2013 at 12:46 PM, Miklos Szeredi <mi...@sz...> wrote: > I'm not familiar with the RESTful API. If you can't enumerate the > files in a directory then my idea won't going to help. > If yoy can enumerate all files (i.e. readdir) then you can effectively > prime your cache in your ->getattr() implementation which means no > more slow , serialized requests over the net. The RESTful API doesn't implement directories, keys to objects might happen to contain forward slashes, but these are not interpreted in any special way by the API. Thank you for your suggestion, but unfortunately I don't think I'll be able to prime the cache in the way you suggest. Luckily I don't think this should be as much of an issue to me as I first expected, the file system's clients do use directories with quite a good spread, so it's going to be pretty rare that multiple getattr requests arrive at the same time in the same directory. Many thanks for your help, it's good to know I didn't just have a config wrong! Andy |