From: Michael M. <Mic...@cs...> - 2011-05-14 21:13:58
|
Hi, I've been looking at the performance of aifffffs and hit an interesting problem. Non-linear reads (i.e. random access) have a performance hit as this file system has to re-initialise the underlying FLAC decoding library. Oddly I noticed that linear accesses through "cat" or "dd" can end up with non-linear access when the FUSE mount has asynchronous reads. What I've found is that with threading enabled, FUSE may make more than one request to read() - as is expected. These requests are verifiable as being always in order, so no problem there. However, internally aifffffs uses a pthread_mutex to protect each file's FLAC decoder context, and it is this mutex which doesn't provided FIFO ordering. Therefore the read requests get handled out of order by the decoder and seeking is performed with a measurable loss in performance. If I use synchronous read()'s (e.g. by mounting with -s), all reads are handled in order and no seeking is required. In this mode, linear file access is faster and measurable on even small data sets. e.g. Single threaded synchronous-read mount: $ for f in 1 2 3 ; do (time cat mnt/*.aiff > /dev/null ) 2>&1 | grep real; done real 0m4.279s real 0m4.282s real 0m4.270s Multi-threaded asynchronous-read mount: $ for f in 1 2 3 ; do (time cat mnt/*.aiff > /dev/null ) 2>&1 | grep real; done real 0m5.850s real 0m6.138s real 0m5.978s Mounting single threaded is no problem, except that now read()'s between different files are also serialised. So accessing files in parallel takes a hit on a multi-core system: e.g. Single threaded synchronous-read mount: $ for f in 1 2 3 ; do (time find mnt/ -name "*.aiff" -print0 | xargs -0 -n1 -P10 cat > /dev/null ) 2>&1 | grep real; done real 0m4.622s real 0m4.471s real 0m4.462s Multi-threaded asynchronous-read mount: $ for f in 1 2 3 ; do (time find mnt/ -name "*.aiff" -print0 | xargs -0 -n1 -P10 cat > /dev/null ) 2>&1 | grep real; done real 0m2.887s real 0m2.739s real 0m2.845s (Note, I can replace cat with md5sum in these tests to ensure the data being processed is identical, although that obviously has more CPU overhead). A mutex (or maybe a cond variable) with FIFO ordering would fix this problem, but doesn't look possible or easy to workaround (basically you can't use a pthread_mutex anymore, and using a pipe or similar implies additional context switches). So I'm wondering if it is possible to make FUSE use synchronous reads within a single file context (i.e. lock per struct fuse_file_info) but multi-threading across the file system? Or maybe there is some other simple solution I'm overlooking? Regards, Mike |