From: Bradley W. S. <set...@or...> - 2009-11-06 19:20:02
|
On 11/06/2009 11:40 AM, Goswin von Brederlow wrote: > "Bradley W. Settlemyer"<set...@or...> writes: > >> Performance. I want to write the data out in parallel -- a special >> parallel file system implemented in fuse. I could build my own >> buffering, but then a 128MB write could require as much as 256MB of RAM. > > Actualy it would cost 128MB +<size of write (128k)> *<num of > parallel requests (10)>. That is 1% overhead. > > But it also costs time to copy the data between the fuse buffer and > your own cache. I'm currently (again, better this time) trying to add > better buffer lifetime to libfuse. > Hmm, what am I missing? Say I have 8 threads. And each wants to operate on 16MB. I have to accumulate 128MB of data before any thread will begin releasing data. The client still has in his buffer 128MB pending on the write call. So that comes to twice the memory cost (minus, perhaps, the last 128K which may be shared if I use a writev type technique to send the data). Is fuse able to optimize away this cost somehow? Now, I'm not afraid to get a bit complicated, what I could do is accumulate the pointers to the buffers assuming that direct_io gives me the same pointer as exists in the client's userspace, and then use a writev call to push a largem amount of data across the network. The problem is I can't tell the difference between 128MB writes and 128K writes, and I will violate POSIX semantics on the latter to speed the performance on the former. I'm also not clear on when I need to copy the buffers fuse gives me and when I can get away with just continuing to use the buffer. Not a major deal to me, but a headache for my users that run 3rd party code. If they actually need to do a small write, I would like to offer them the correct semantic if possible. Note, we have prototyped our threaded write technique and we need the larger buffers to truly achieve the performance we desire. But we will take what we can get I suppose also. Cheers, Brad |