From: Chris N. <the...@gm...> - 2010-12-27 13:13:35
|
Hi, I'm currently working on writing a thin file system shim using FUSE. I would do this using something like the Minifilter infrastructure under Windows but alas Linux seemingly has no such facility (at least none that is generally applicable to distributions people actually use)! The underlaying file system would be a normally supported Linux FS - I'm not looking to implement encryption, compression or anything that alters what goes to disk. The project I'm working on requires that order of operations be preserved - if an application using the FUSE mountpoint I intend to provide writes offsets 123, 11 and 1024 in that order, then I must preserve that order. >From my tinkering, would I be right in saying that the only ways of achieving this are: * Use the low level FUSE API and rely on the value of the unique field of the fuse_req_t argument * Ask FUSE to call my code in a single-threaded fashion and implement my own threading setup Of course, I'm assuming that order of operations is preserved under the two above cases - is this true? All comments gratefully appreciated. The project I'm working on is to be licensed under the GPL for the Linux codebase. Best regards, Chris |
From: Stef B. <st...@gm...> - 2010-12-27 14:27:00
|
Hi, Well I guess that the order of calls is preserved. You name the write call. As far as I understand a read call from an app result in a batch of read calls in de fuse fs. When reading a whole file, or only a piece, the order in which the peices are read are somewhere (i do not know where exactly, glibc??) in the calling proces. When fuse is ready with reading a batch (from a particular offset and size) it will give that back to the calling proces. In most cases the offset of the next batch read, the offset=offset_previous+size_previous So fuse will not mix things up, it just follows the calling proces. With writing it will do the same. if the calling proces will first write at offset 123, it will do that. And then 11 and then at 1024. The only rule is here that the calling proces will wait for the first write to finish. That looks almost necessary. I'm not very familiar with aio, but when you do aio, the order of write calls might mix. And maybe also multithreading, I do not know. But it should work definitely if: - direct_io - single thread But key in this proces in any case is not FUSE, but the calling proces I guess. Stef |
From: Chris N. <the...@gm...> - 2010-12-27 14:49:50
|
Hi Stef, First, thanks for your reply, especially at this time of year! I completely agree with your thinking, and ironically you've described my problem quite well. I'd like to be able to handle applications that are employing thread pools for IO, Asynchronous IO and applications that are just stupidly designed. In all likelihood, read operations won't be of interest. I'll probably only need to consider writes (I'm working on a CDP/Near-CDP solution). My concern is that some silly person has implemented the following: (For this exercise, a<b<c, x<y<x and thread scheduling occurs in order listed below) Thread 1: Reserve space a-b in file A. Thread 2: Reserve space b-c in file A. Thread 2: Write to file A in space b-c, and file B in space x-y Thread 1: Write to file A in space a-b, and file B in space y-z My concern is that there is some application somewhere that engages in this behavior (depending on when you decide who gets what offset, it wouldn't be too far fetched) and that I don't want to be in a position to tell poor sysadmins supporting such apps to yell at their vendor. Do you think I can safely discount such instances? Chris On Tue, Dec 28, 2010 at 1:26 AM, Stef Bon <st...@gm...> wrote: > Hi, > > > Well I guess that the order of calls is preserved. > You name the write call. > > As far as I understand a read call from an app result in a batch of > read calls in de fuse fs. > When reading a whole file, or only a piece, the order in which the > peices are read are somewhere (i do not know where exactly, glibc??) > in the calling proces. When fuse is ready with reading a batch (from a > particular offset and size) it will give that back to the calling > proces. > > In most cases the offset of the next batch read, the > offset=offset_previous+size_previous > > So fuse will not mix things up, it just follows the calling proces. > > With writing it will do the same. if the calling proces will first > write at offset 123, it will do that. > And then 11 and then at 1024. > > The only rule is here that the calling proces will wait for the first > write to finish. That looks almost necessary. I'm not very familiar > with aio, but when you do aio, the order of write calls might mix. > And maybe also multithreading, I do not know. > > But it should work definitely if: > - direct_io > - single thread > > But key in this proces in any case is not FUSE, but the calling proces I > guess. > > Stef > |
From: Stef B. <st...@gm...> - 2010-12-27 16:48:33
|
2010/12/27 Chris Nolan <the...@gm...>: > Hi Stef, > First, thanks for your reply, especially at this time of year! > I completely agree with your thinking, and ironically you've described my > problem quite well. Ironically, I do not understand that one. What's so inronically about it? I'm in the middle of writing a lowlevel version of cddfs, a fs to access audio cd's. Especially the reading (an not the writing of course) is causing a lot of thinking. A sum: . the fs has to insert a wavheader (or better a so called riffheader) at the head of the wavfile. . the library cdda paranoia reads one sector a time, and the cursor where the reading is on the cdrom is then pointing at the next sector. So when reading data, the offset+size may end up somewhere inbetween, somewhere in the sector. It's a smart thing to keep the data just read for the next batch.. just to prevent unnec. reads and seeks. Well I cannot understand your problem exactly. What you describe : ---------- Thread 1: Reserve space a-b in file A. Thread 2: Reserve space b-c in file A. Thread 2: Write to file A in space b-c, and file B in space x-y Thread 1: Write to file A in space a-b, and file B in space y-z --------- is this an application?? It looks very complicated, cause the offset maybe to old, is this the problem? Stef |
From: Goswin v. B. <gos...@we...> - 2010-12-27 17:12:46
|
Chris Nolan <the...@gm...> writes: > Hi Stef, > > First, thanks for your reply, especially at this time of year! > > I completely agree with your thinking, and ironically you've described my > problem quite well. > > I'd like to be able to handle applications that are employing thread pools > for IO, Asynchronous IO and applications that are just stupidly designed. > > In all likelihood, read operations won't be of interest. I'll probably only > need to consider writes (I'm working on a CDP/Near-CDP solution). My concern > is that some silly person has implemented the following: > > (For this exercise, a<b<c, x<y<x and thread scheduling occurs in order > listed below) > > Thread 1: Reserve space a-b in file A. > Thread 2: Reserve space b-c in file A. > Thread 2: Write to file A in space b-c, and file B in space x-y > Thread 1: Write to file A in space a-b, and file B in space y-z > > My concern is that there is some application somewhere that engages in this > behavior (depending on when you decide who gets what offset, it wouldn't be > too far fetched) and that I don't want to be in a position to tell poor > sysadmins supporting such apps to yell at their vendor. > > Do you think I can safely discount such instances? > > Chris Where do you see a problem there? There are no overlaps (I'm assuming you mean a - (b-1) and b - (c-1)) so the write order doesn't come into play at all. Any order will give the same result. Further, unless you have some networked system, the write/read ordering is done in the kernel VFS layer and done right and fuse is only later called to flush the cached data to disk or read in missing data. Fuse might see the write requests to B before the ones to A but from the applications point of view that is irelevant. That is unless the system crashes after writing B but before writing A. But there is nothing you can do about that if the app is stupid and doesn't properly use fsync(), msync(), ... to ensure the right ordering against crashes. MfG Goswin |
From: Goswin v. B. <gos...@we...> - 2010-12-27 17:02:55
|
Chris Nolan <the...@gm...> writes: > Hi, > > I'm currently working on writing a thin file system shim using FUSE. I would > do this using something like the Minifilter infrastructure under Windows but > alas Linux seemingly has no such facility (at least none that is generally > applicable to distributions people actually use)! The underlaying file > system would be a normally supported Linux FS - I'm not looking to implement > encryption, compression or anything that alters what goes to disk. > > The project I'm working on requires that order of operations be preserved - > if an application using the FUSE mountpoint I intend to provide writes > offsets 123, 11 and 1024 in that order, then I must preserve that order. Then you have already lost. The kernel already does caching, scatter gathering and reordering. If your app needs the strict ordering then it needs to open the file with O_DIRECT and/or O_SYNC. >>From my tinkering, would I be right in saying that the only ways of > achieving this are: > > * Use the low level FUSE API and rely on the value of the unique field of > the fuse_req_t argument > * Ask FUSE to call my code in a single-threaded fashion and implement my own > threading setup > > Of course, I'm assuming that order of operations is preserved under the two > above cases - is this true? > > All comments gratefully appreciated. The project I'm working on is to be > licensed under the GPL for the Linux codebase. > > Best regards, > > Chris The order is allways preserved to the extend promised by the flags your app used when opening the files. I.e. usualy not at all. If the app uses O_DIRECT/O_SYNC then fuse will simply not see multiple requests in parallel (from one thread of one app). If it does not then the order is already indetermined before fuse gets involved. MfG Goswin |