Re: [fuse-devel] bypassing read/write for mirror fs

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Hi Goswin,

On Tue, Apr 2, 2013 at 9:41 AM, Goswin von Brederlow <gos...@we...>wrote:

> On Sun, Mar 31, 2013 at 02:44:10PM -0400, Mike Shal wrote:
> > Hello again, hope you don't mind revisiting this topic, but I have an
> > example patch and some more benchmarks...
> >
> > Here are a few other examples:
> >
> > 1) Large ~3GB read (cat bigfile.txt > /dev/null)
> > native fs: 0.279s
> > fuse: 1.392s (~5x slower)
> > fuse passthrough: 0.279s (no difference!)
> >
> > 2) Large (100MB) write (dd bs=1M count=100 if=/dev/zero of=outfile)
> > native fs: 0.048s
> > fuse: 0.609s (~12x slower)
> > fuse passthrough: 0.048s (no difference!)
> >
> > Note that in all cases, the speed of the underlying disk is irrelevant
> > since everything is cached.
> >
> > I think this is significant enough to warrant adding the functionality to
> > FUSE.
> >
> >
> > >
> > > And the performance of fuse can be improved further.  For example Pavel
> > > Emelyanov is working on a patchset that allows the kernel to cache
> > > writes, just like any other filesystem, bringing the cached write
> > > performance up to the baseline you measured.
> > >
> >
> > I'd be happy to perform other tests if you can provide some details on
> how
> > to run them (changes to fusexmp_fh). I don't see how caching writes would
> > help for cases like this though - read performance is also a major
> concern.
>
> So how much faster does fuse get with big writes (and I mean 128k or
> more here) and with splice operations for the same tests?
>
>
Here are my results:

A) ./fusexmp_fh -obig_writes
1) link test: 45.149s (~2 second improvement, still 137% longer than native)
2) read test: no change
3) write test: 0.173s (now 3.5x slower, rather than 12x slower)

So it seems for the case I really care about (the end-to-end linking time),
writing is a small portion of the total time. However, it does speed up the
write-only test significantly using a 128k buffer instead of the default 4k
buffer. It is still 3.5x slower, whereas with the passthrough
implementation it achieves native speeds.

B) ./fusexmp_fh -osplice_write -osplice_read
1) link test: 47.339s (no real change over the default fuse)
2) read test: 0.656s (twice as fast as default fuse, but still twice as
slow as native)
3) write test: 0.545s (slightly better than default fuse, but still 11x
slower than native)

I also tried with -osplice_move, but for some reason that makes all reads
pull from the disk rather than the cache. This makes the link test and read
test pretty abysmal:

C) ./fusexmp_fh -osplice_move -osplice_write -osplice_read
1) link test: 1m0.154s
2) read test: 7.536s

I don't really know what's going on there, though (maybe I'm using it
wrong?)

In all, it seems these options help a little bit, but nowhere near as much
as a passthrough implementation.

Any other thoughts / suggestions to try?

Thanks,
-Mike