[fuse-devel] Threading and performance

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Hi,

I've been looking at the performance of aifffffs and hit an interesting
problem.

Non-linear reads (i.e. random access) have a performance hit as this
file system has to re-initialise the underlying FLAC decoding library.
Oddly I noticed that linear accesses through "cat" or "dd" can end up
with non-linear access when the FUSE mount has asynchronous reads.

What I've found is that with threading enabled, FUSE may make more than
one request to read() - as is expected.  These requests are verifiable
as being always in order, so no problem there.  However, internally
aifffffs uses a pthread_mutex to protect each file's FLAC decoder
context, and it is this mutex which doesn't provided FIFO ordering.
Therefore the read requests get handled out of order by the decoder and
seeking is performed with a measurable loss in performance.

If I use synchronous read()'s (e.g. by mounting with -s), all reads are
handled in order and no seeking is required.  In this mode, linear file
access is faster and measurable on even small data sets.

e.g.
Single threaded synchronous-read mount:
$ for f in 1 2 3 ; do (time cat mnt/*.aiff  > /dev/null ) 2>&1 | grep
real; done
real	0m4.279s
real	0m4.282s
real	0m4.270s

Multi-threaded asynchronous-read mount:
$ for f in 1 2 3 ; do (time cat mnt/*.aiff  > /dev/null ) 2>&1 | grep
real; done
real	0m5.850s
real	0m6.138s
real	0m5.978s

Mounting single threaded is no problem, except that now read()'s between
different files are also serialised.  So accessing files in parallel
takes a hit on a multi-core system:

e.g.
Single threaded synchronous-read mount:
$ for f in 1 2 3 ; do (time find mnt/ -name "*.aiff" -print0 | xargs -0
-n1 -P10 cat > /dev/null ) 2>&1 | grep real; done
real	0m4.622s
real	0m4.471s
real	0m4.462s

Multi-threaded asynchronous-read mount:
$ for f in 1 2 3 ; do (time find mnt/ -name "*.aiff" -print0 | xargs -0
-n1 -P10 cat > /dev/null ) 2>&1 | grep real; done
real	0m2.887s
real	0m2.739s
real	0m2.845s

(Note, I can replace cat with md5sum in these tests to ensure the data
being processed is identical, although that obviously has more CPU
overhead).

A mutex (or maybe a cond variable) with FIFO ordering would fix this
problem, but doesn't look possible or easy to workaround (basically you
can't use a pthread_mutex anymore, and using a pipe or similar implies
additional context switches).

So I'm wondering if it is possible to make FUSE use synchronous reads
within a single file context (i.e. lock per struct fuse_file_info) but
multi-threading across the file system?

Or maybe there is some other simple solution I'm overlooking?

Regards,

Mike