From: 多. <390...@qq...> - 2013-11-10 09:58:55
|
hello coders. I test write speed of fuse , raise 50MB/s on disk. and I test write speed of ext4 too, raise 100MB/s on disk. fuse is slow, and how can we raise speed. ext4: write->vfs.write->ext4.write fuse: write->vfs.write->fuse(in kernel) recv -> fuse (in user space) recv-> fuse.write. ext4 write use memcpy data twice. but fuse may be more times, 3 or 4 times ? I don't sure. how to reduce memcpy data times. and i think , if memcpy data just happened 2 times. we will get the performance like ext4 and more native fs. and we can do more things. |
From: Goswin v. B. <gos...@we...> - 2013-11-11 14:55:49
|
On Sun, Nov 10, 2013 at 05:58:43PM +0800, ???????????? wrote: > hello coders. > > > I test write speed of fuse , raise 50MB/s on disk. > and I test write speed of ext4 too, raise 100MB/s on disk. > > > fuse is slow, and how can we raise speed. > > > ext4: write->vfs.write->ext4.write > fuse: write->vfs.write->fuse(in kernel) recv -> fuse (in user space) recv-> fuse.write. Use splice mode in fuse. > ext4 write use memcpy data twice. > but fuse may be more times, 3 or 4 times ? I don't sure. > > > how to reduce memcpy data times. > and i think , if memcpy data just happened 2 times. > > > we will get the performance like ext4 and more native fs. > > > and we can do more things. Is your CPU so slow that a few memcpy cost you 50% performance? Is fuse using 100% cpu? I would think you are blaming the wrong thing. Check the chunk size of writes you get. Use big writes. And set the optimal block size for the FS to 64-128k at least and use something that honors that. MfG Goswin |
From: Mike S. <ma...@gm...> - 2013-11-12 22:44:11
|
On Mon, Nov 11, 2013 at 9:55 AM, Goswin von Brederlow <gos...@we...>wrote: > On Sun, Nov 10, 2013 at 05:58:43PM +0800, ???????????? wrote: > > hello coders. > > > > > > I test write speed of fuse , raise 50MB/s on disk. > > and I test write speed of ext4 too, raise 100MB/s on disk. > > > > > > fuse is slow, and how can we raise speed. > > > > > > ext4: write->vfs.write->ext4.write > > fuse: write->vfs.write->fuse(in kernel) recv -> fuse (in user space) > recv-> fuse.write. > > Use splice mode in fuse. > Last time I looked into splice mode, it was slightly worse than the default fusexmp_fh (my use case was linking libxul from Firefox inside a FUSE file-system) native: 18.986s passthrough: 24.754s -obig_writes: 45.149s -osplice_write -osplice_read -obig_writes: 45.622s fusexmp_fh defaults: 47.232 -osplice_write -osplice_read: 47.339s This was back in April: http://article.gmane.org/gmane.comp.file-systems.fuse.devel/12857/ Have things changed since then that splice is worth looking at again? > > > ext4 write use memcpy data twice. > > but fuse may be more times, 3 or 4 times ? I don't sure. > > > > > > how to reduce memcpy data times. > > and i think , if memcpy data just happened 2 times. > > > > > > we will get the performance like ext4 and more native fs. > > > > > > and we can do more things. > > Is your CPU so slow that a few memcpy cost you 50% performance? Is > fuse using 100% cpu? I would think you are blaming the wrong thing. > > Check the chunk size of writes you get. Use big writes. And set the > optimal block size for the FS to 64-128k at least and use something > that honors that. > > Big writes will definitely help in a simple test like what the OP is doing, since it naturally cuts down on the number of calls to/from FUSE. Based on the profiling I did a while back, it's those calls that are the actual bottleneck, not the time to do an extra memcpy. One other thing you can try is to look at your CPU governor - if it's "ondemand", it won't play nicely with a FUSE fs - the constant back&forth between your write process and your FUSE process seems to confuse the kernel and prevents it from scaling up the cpu. Try setting it to "performance" instead. I use these bash functions: function setgov () { echo "$1" | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor } function getgov() { cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor } Eg: $ getgov ondemand (This is the default on Ubuntu, for example) $ (benchmark your test) $ setgov performance $ (benchmark your test) Here are my results using dd if=/dev/zero of=tmp.img bs=1024 count=102400 native: 391 MB/s fusexmp_fh, ondemand: 27.2 MB/s fusexmp_fh, performance: 61.6 MB/s It's not close to native, but it helps a little. -Mike |
From: Mike S. <ma...@gm...> - 2013-11-19 20:54:21
|
Hi Goswin, thanks for your feedback. On Fri, Nov 15, 2013 at 9:55 AM, Goswin von Brederlow <gos...@we...>wrote: > > If your filesystem itself does not use splice then all that does is > pass the data through splice just to then copy it in userspace. You > need to support splice in your FS and splice the data directly from > the kernel to the disk or socket. > Can you clarify what you mean here (preferably with an example? :). My understanding from various newsgroup posts has been that the read_buf/write_buf functions in fusexmp_fh.c already supported splicing. I do see different code-paths in fuse_buf_copy_one() getting executed when comparing '-osplice_read -osplice_write' to '-ono_splice_read -ono_splice_write', but I do not see any significant difference in execution time. Are we talking about the same thing here? Or how exactly should I be executing FUSE & what should I be looking at to know if I have splicing enabled correctly? Or, perhaps more succinctly so we don't deviate too far from the goal: What is the best way, today, to execute a FUSE loopback system (like the fusexmp/fusexmp_fh examples) to maximize performance? > > > One other thing you can try is to look at your CPU governor - if it's > > "ondemand", it won't play nicely with a FUSE fs - the constant back&forth > > between your write process and your FUSE process seems to confuse the > > kernel and prevents it from scaling up the cpu. Try setting it to > > "performance" instead. I use these bash functions: > > pinning the process using fuse and fuse on the same core can also > help. Or not. Depends a bit on the use case. > > Yeah, that could be useful in certain circumstances. In my case I have many processes running at the same time in the FUSE fs, so I need to run all the sub-processes on different cores in order to take advantage of parallelization. Thanks again, -Mike |
From: Goswin v. B. <gos...@we...> - 2013-11-26 09:36:39
|
On Tue, Nov 19, 2013 at 03:54:13PM -0500, Mike Shal wrote: > Hi Goswin, thanks for your feedback. > > On Fri, Nov 15, 2013 at 9:55 AM, Goswin von Brederlow <gos...@we...>wrote: > > > > > If your filesystem itself does not use splice then all that does is > > pass the data through splice just to then copy it in userspace. You > > need to support splice in your FS and splice the data directly from > > the kernel to the disk or socket. > > > > Can you clarify what you mean here (preferably with an example? :). My > understanding from various newsgroup posts has been that the > read_buf/write_buf functions in fusexmp_fh.c already supported splicing. I > do see different code-paths in fuse_buf_copy_one() getting executed when > comparing '-osplice_read -osplice_write' to '-ono_splice_read > -ono_splice_write', but I do not see any significant difference in > execution time. Are we talking about the same thing here? Or how exactly > should I be executing FUSE & what should I be looking at to know if I have > splicing enabled correctly? If you enable splicing and use read_buf/write_buf but the filesystem uses a memory based buffer to fuse_buf_copy() from/to then you just moved the copying of data from the kernel to user space and added the overhead to splice the data in the kernel. So you get less performance. For splicing to make sense the filesystem needs to use a FD base buf so the data is actually spliced directly from/to the FD without copying. The fusexmp_fh.c example does this. Note: I recently tried using splice() in an attempt to write a faster cp tool and to my surprise I couldn't get it to be faster. For some reason a simple read()/write() loop beat me every time. So maybe splicing in kernel is just screwed up and slower even if it saves memcpy() calls. MfG Goswin |