From: Douglas G. H. <dou...@gm...> - 2011-07-22 11:14:02
|
Hello again FUSE experts, A while ago I enquired about the use of FUSE as a file system replication engine (over ext3/4). I thought I had satisfied myself that this was a sensible thing to do. However, based on some recent threads I fear that there might be a fly in the ointment. I note that O_DIRECT is not available in fcntl() and is only available in open() via a recent patch. I also note some queries about POSIX ACL support. So my question is this. If I am using FUSE as a pass through layer over an Ext file system what is the risk that I introduce some non-POSIX IO behaviour? My query is focused on any inherent POSIX compliance limitations within FUSE rather than my ability to code it correctly. I don't want to break any applications that are working fine on Ext3 but will break if they run on top of my FUSE based replication engine. Thanks in advance for your assistance. Regards, Douglas. On 10 June 2011 11:58, Goswin von Brederlow <gos...@we...> wrote: > "Douglas G. Hanley" <dou...@gm...> writes: > > > Hi FUSE Experts, > > > > I am thinking about developing a Linux file system replication engine > using > > FUSE. The general idea is this: > > > > > > - Assume a starting point of two Linux filesystems (let's say ext3) > which > > have been magically synchronised between two servers (source and > target). > > Which you pretty much have to take at face value or check every single > file (which would take way too long). This part is the tricky part, > esspecially after a crash. Unless you use journaling or something you > will get differences after a crash. > > > - Based on a specification of the directories and files one wishes > > replicate (let's say a regular expression) use FUSE to mount the > required > > parent directories on the source file system and start to intercept > file > > updates and modifications > > - Ship these intercepted file updates and modifications to the target > > server to apply the same updates and modifications to that filesystem > to > > keep them perfectly in sync > > Keep it simple and flexible. Mount the remote via NFS (or smb or > whatever) so it is just another local directory. Then run your fuse > filesystem with directories (can be more than 2) and let it replicate > between all directories given. E.g. something like this: > > replicate --pattern "*.c" --pattern "*.h" /mnt/replicated /mnt/local > /mnt/server1 /mnt/server2 > > > - Of course, the intercepted updates are passed though to the source > file > > system as well (either before sending to the target server > (asynchronous > > replication) or after sending to the target server (synchronous > replication) > > - This needs to work for all possible IO update operations that might > be > > exercised on the source server > > > > I done some research and FUSE seems like an excellent platform to achieve > > this but I have a few newbie questions that I would appreciate some help > > with: > > > > > > - Will FUSE support interception of memory mapped IO? I suspect this > is > > handled but I don't see any hooks for this in the API. Is support > hidden > > behind the API somehow? > > mmap() is handled inside the kernel and fuse will just read/write > requests for the relevant pages. > > > - Will FUSE support interception of direct IO? I suspect the answer > is > > yes but was wondering if there are any caveats I need to be aware of? > > Yes. There are some issues with implementing it right though. Never > cared about this myself. > > > - Programmatically, can I mount more than one point in the source > > filesystem from a single executable. I believe I can, but do I > definately > > need to have a thread for each mount? > > Keep it simple. Do the mounting before starting your filesystem and > start it with simple directories as arguments. > > > - I am pretty sure that all of my code in the FUSE callbacks needs to > be > > reentrant, can you confirm that this is the case i.e. is there any > > serialisation built into the API somewhere, on some callbacks? > > That totaly depends on what you code. The single threaded loop is totaly > serialized. The multithreaded loop is serialized within one thread. So > you need to be thread safe for that but not reentrant. Or if you write > your own low level loop you can make it reentrant. > > But I don't really see where that question even arises. Your filesystem > is just a extention of the example fuse filesystem and will be just > naturally reentrant anyway. > > > - I plan to be able to replicate updates synchronously and > > asynchronously, I don't foresee any issues with this, do you? > > Carefull with race conditions and with asynchronous updates be extra > carefull of crashes. > > > - Are there any other IO update operations (I have not mentioned) that > > FUSE definitely can't see > > - Are there any implementation restrictions within FUSE which would > > compromise my ability to intercept absolutely all updates of any > nature > > and/or size. > > No. The kernel can not access your filesystem other than going through > fuse. > > Which doesn't mean something can't access the underlying source > filesystems directly. You have to protect them yourself. > > > Thanks very much for any help you can provide on these questions. > > > > Regards, > > MfG > Goswin > -- Douglas. |