From: Mike S. <ma...@gm...> - 2013-04-02 14:58:40
|
Hi Goswin, On Tue, Apr 2, 2013 at 9:41 AM, Goswin von Brederlow <gos...@we...>wrote: > On Sun, Mar 31, 2013 at 02:44:10PM -0400, Mike Shal wrote: > > Hello again, hope you don't mind revisiting this topic, but I have an > > example patch and some more benchmarks... > > > > Here are a few other examples: > > > > 1) Large ~3GB read (cat bigfile.txt > /dev/null) > > native fs: 0.279s > > fuse: 1.392s (~5x slower) > > fuse passthrough: 0.279s (no difference!) > > > > 2) Large (100MB) write (dd bs=1M count=100 if=/dev/zero of=outfile) > > native fs: 0.048s > > fuse: 0.609s (~12x slower) > > fuse passthrough: 0.048s (no difference!) > > > > Note that in all cases, the speed of the underlying disk is irrelevant > > since everything is cached. > > > > I think this is significant enough to warrant adding the functionality to > > FUSE. > > > > > > > > > > And the performance of fuse can be improved further. For example Pavel > > > Emelyanov is working on a patchset that allows the kernel to cache > > > writes, just like any other filesystem, bringing the cached write > > > performance up to the baseline you measured. > > > > > > > I'd be happy to perform other tests if you can provide some details on > how > > to run them (changes to fusexmp_fh). I don't see how caching writes would > > help for cases like this though - read performance is also a major > concern. > > So how much faster does fuse get with big writes (and I mean 128k or > more here) and with splice operations for the same tests? > > Here are my results: A) ./fusexmp_fh -obig_writes 1) link test: 45.149s (~2 second improvement, still 137% longer than native) 2) read test: no change 3) write test: 0.173s (now 3.5x slower, rather than 12x slower) So it seems for the case I really care about (the end-to-end linking time), writing is a small portion of the total time. However, it does speed up the write-only test significantly using a 128k buffer instead of the default 4k buffer. It is still 3.5x slower, whereas with the passthrough implementation it achieves native speeds. B) ./fusexmp_fh -osplice_write -osplice_read 1) link test: 47.339s (no real change over the default fuse) 2) read test: 0.656s (twice as fast as default fuse, but still twice as slow as native) 3) write test: 0.545s (slightly better than default fuse, but still 11x slower than native) I also tried with -osplice_move, but for some reason that makes all reads pull from the disk rather than the cache. This makes the link test and read test pretty abysmal: C) ./fusexmp_fh -osplice_move -osplice_write -osplice_read 1) link test: 1m0.154s 2) read test: 7.536s I don't really know what's going on there, though (maybe I'm using it wrong?) In all, it seems these options help a little bit, but nowhere near as much as a passthrough implementation. Any other thoughts / suggestions to try? Thanks, -Mike |