From: John B. <joh...@la...> - 2010-09-30 16:14:29
|
Excerpts from Miklos Szeredi's message of Thu Sep 30 02:52:36 -0600 2010: > On Wed, 29 Sep 2010, John Bent wrote: > > Excerpts from John Bent's message of Wed Sep 29 09:35:01 -0600 2010: > > > Excerpts from Goswin von Brederlow's message of Wed Sep 29 01:55:58 -0600 2010: > > > > So write locks the file so no 2 writes (overlapping or not) happen > > > > concurrently. On the other hands reads are safe to happen concurrently > > > > all the time so no lock is required. > > > > > > > > One test that is missing is doing read and write in parallel. The > > > > expected pattern would be to have reads concurrently, then stop for a > > > > single write, and then reads concurrently again. > > > > > > > > If you want to patch this you will have to maintain a list of byte > > > > ranges for each write and make sure no 2 concurrent writes overlap. > > > > > > > Why not just pass them through concurrently and allow the file > > > system to handle them as it deems fit? That's what we'd prefer. > > > > > Just to expand a bit. We prefer this because we're doing concurrent > > writes across an entire cluster to a single file. So protecting on each > > individual node for overlapped writes is redundant since the underlying > > file system must already handle this across the entire cluster. So FUSE > > doing it gets us nothing but costs a fair bit of complexity within FUSE. > > I think there's a fair bit of confusion here. > > In linux concurrent writes to a single file are serialized by > inode->i_mutex for various reasons. > > But writes are normally asynchronous, i.e. the write(2) system call > returns immediately after the data has been buffered. This makes the > system call efficient and the serialization isn't normally a problem. > > In fuse write requests are currently synchronous, the write(2) system > call won't return until the fuse filesystem finishes its "write" > operation. > > So what you are probably seeing is the the linux kernel serializing > write requests. There are two fixes for that: > > 1) implement buffered writes in the fuse kernel module > > 2) implement buffering in your filesystem > Thanks Miklos; we didn't realize it was the kernel serializing. It seems easy enough for us to do aio in our daemons and make sure they're flushed although this does add a memcopy. Oh well. -- Thanks, John |