From: Mike S. <ma...@gm...> - 2012-09-11 03:40:41
|
Hello, I would like to avoid having the read() and write() calls go through my user fs, but rather go directly to the underlying filesystem that I'm mirroring. Essentially I want to let fuse know of the "real" file descriptor somehow in the open() call, and when the fuse kernel side gets a read() or write(), it will go back to the VFS layer with that file descriptor instead of going out to my process (which just does a pread() or pwrite() anyway). I would still like to get any other calls, such as unlink(), chown(), access(), etc. This has been discussed a while back: http://thread.gmane.org/gmane.comp.file-systems.fuse.devel/5946/focus=5947 - has it been tried at all and is not actually workable? Or is there just not much interest? Some other similar threads I've found suggested a few other performance improvements for write(), such as using -obig_writes, using write_buf() instead of write(), and using direct_io. I tried these approaches all in the same test scenario with my program and came up with the following results: 4.200s: Default fuse fs (4k writes) with 582176 write() calls 4.252s: write_buf with 582176 write() calls 2.063s: direct_io (seems this enables 32k writes automatically?) with 72776 write() calls 2.510s: -obig_writes (32k writes) 72776 write() calls 0.419s: baseline without fuse at all (the closer to this, the better :) (Note the absolute values are meaningless - this is just to compare one against another) So write_buf doesn't seem to help my program at all, while increasing the buffer size helps quite a bit due to the fewer write calls. I then ran callgrind on the direct_io version to see where the time is actually going. The total Ir for this test is 258M, with 142M (55%) going to fuse_lib_write_buf() and 49M (19%) going to fuse_lib_read(). Within fuse_lib_write_buf(), the major parts are: 67M: get_path_null() 17M: fuse_fs_write_buf() 22M: free_path() 27M: fuse_reply_write() It seems a lot of effort is going into getting/freeing the path for each 32k chunk of data written. In my case I don't care about the path at all for read & write, so this is unnecessary overhead for me. As a quick test I tried to comment out the calls to get_path_nullok() and free_path() in both fuse_lib_read() and fuse_lib_write_buf(), but this only shaved off ~100ms or so. Do you think it is feasible to be able to provide fuse with a "real" fd during open() so that read/write in the userspace side of things can be skipped entirely? I'd be happy to try to work up a patch, though any guidance would be appreciated :) Thanks! -Mike |
From: Miklos S. <mi...@sz...> - 2012-09-13 12:47:00
|
Mike Shal <ma...@gm...> writes: > Hello, > > I would like to avoid having the read() and write() calls go through > my user fs, but rather go directly to the underlying filesystem that > I'm mirroring. Essentially I want to let fuse know of the "real" file > descriptor somehow in the open() call, and when the fuse kernel side > gets a read() or write(), it will go back to the VFS layer with that > file descriptor instead of going out to my process (which just does a > pread() or pwrite() anyway). I would still like to get any other > calls, such as unlink(), chown(), access(), etc. This has been > discussed a while back: > http://thread.gmane.org/gmane.comp.file-systems.fuse.devel/5946/focus=5947 > - has it been tried at all and is not actually workable? Or is there > just not much interest? > > Some other similar threads I've found suggested a few other > performance improvements for write(), such as using -obig_writes, > using write_buf() instead of write(), and using direct_io. I tried > these approaches all in the same test scenario with my program and > came up with the following results: > > 4.200s: Default fuse fs (4k writes) with 582176 write() calls > 4.252s: write_buf with 582176 write() calls > 2.063s: direct_io (seems this enables 32k writes automatically?) with > 72776 write() calls > 2.510s: -obig_writes (32k writes) 72776 write() calls > 0.419s: baseline without fuse at all (the closer to this, the better :) Even the big_writes number comes out at 900MB/s, which to me doesn't sound bad at all. True, the baseline is six times better, but is that really important? And the performance of fuse can be improved further. For example Pavel Emelyanov is working on a patchset that allows the kernel to cache writes, just like any other filesystem, bringing the cached write performance up to the baseline you measured. > (Note the absolute values are meaningless - this is just to compare > one against another) > > So write_buf doesn't seem to help my program at all, while increasing > the buffer size helps quite a bit due to the fewer write calls. > > I then ran callgrind on the direct_io version to see where the time is > actually going. The total Ir for this test is 258M, with 142M (55%) > going to fuse_lib_write_buf() and 49M (19%) going to fuse_lib_read(). > Within fuse_lib_write_buf(), the major parts are: > > 67M: get_path_null() > 17M: fuse_fs_write_buf() > 22M: free_path() > 27M: fuse_reply_write() > > It seems a lot of effort is going into getting/freeing the path for > each 32k chunk of data written. In my case I don't care about the path > at all for read & write, so this is unnecessary overhead for me. As a > quick test I tried to comment out the calls to get_path_nullok() and > free_path() in both fuse_lib_read() and fuse_lib_write_buf(), but this > only shaved off ~100ms or so. > > Do you think it is feasible to be able to provide fuse with a "real" > fd during open() so that read/write in the userspace side of things > can be skipped entirely? I'd be happy to try to work up a patch, > though any guidance would be appreciated :) It's feasable, and it's not even complicated, which makes it very tempting. But it solves a special case only, and doesn't improve anything else. It's not a generic solution. If we've done everything to improve the performance of the general case and it's still not good enough, then I'm open to adding optimizations for special cases like this. Until then you can help with implementing or testing performance improvements. Thanks, Miklos > https://lists.sourceforge.net/lists/listinfo/fuse-devel |
From: Mike S. <ma...@gm...> - 2013-03-31 18:44:20
Attachments:
passthrough-fuse.patch
passthrough-linux.patch
|
Hello again, hope you don't mind revisiting this topic, but I have an example patch and some more benchmarks... On Thu, Sep 13, 2012 at 8:48 AM, Miklos Szeredi <mi...@sz...> wrote: > Mike Shal <ma...@gm...> writes: > > > Hello, > > > > I would like to avoid having the read() and write() calls go through > > my user fs, but rather go directly to the underlying filesystem that > > I'm mirroring. Essentially I want to let fuse know of the "real" file > > descriptor somehow in the open() call, and when the fuse kernel side > > gets a read() or write(), it will go back to the VFS layer with that > > file descriptor instead of going out to my process (which just does a > > pread() or pwrite() anyway). I would still like to get any other > > calls, such as unlink(), chown(), access(), etc. This has been > > discussed a while back: > > > http://thread.gmane.org/gmane.comp.file-systems.fuse.devel/5946/focus=5947 > > - has it been tried at all and is not actually workable? Or is there > > just not much interest? > > > > Some other similar threads I've found suggested a few other > > performance improvements for write(), such as using -obig_writes, > > using write_buf() instead of write(), and using direct_io. I tried > > these approaches all in the same test scenario with my program and > > came up with the following results: > > > > 4.200s: Default fuse fs (4k writes) with 582176 write() calls > > 4.252s: write_buf with 582176 write() calls > > 2.063s: direct_io (seems this enables 32k writes automatically?) with > > 72776 write() calls > > 2.510s: -obig_writes (32k writes) 72776 write() calls > > 0.419s: baseline without fuse at all (the closer to this, the better :) > > Even the big_writes number comes out at 900MB/s, which to me doesn't > sound bad at all. True, the baseline is six times better, but is that > really important? > Yes, I believe it is that important. As one example, I am trying to link a large number of files. In the native file-system, the linker runs in 18.986s. However, when I run the same link through fuse (using fusexmp_fh), it takes 47.232s (148% longer than native). With a patch to allow read/write passthrough, this goes down to 24.754s (30% longer than native). Here are a few other examples: 1) Large ~3GB read (cat bigfile.txt > /dev/null) native fs: 0.279s fuse: 1.392s (~5x slower) fuse passthrough: 0.279s (no difference!) 2) Large (100MB) write (dd bs=1M count=100 if=/dev/zero of=outfile) native fs: 0.048s fuse: 0.609s (~12x slower) fuse passthrough: 0.048s (no difference!) Note that in all cases, the speed of the underlying disk is irrelevant since everything is cached. I think this is significant enough to warrant adding the functionality to FUSE. > > And the performance of fuse can be improved further. For example Pavel > Emelyanov is working on a patchset that allows the kernel to cache > writes, just like any other filesystem, bringing the cached write > performance up to the baseline you measured. > I'd be happy to perform other tests if you can provide some details on how to run them (changes to fusexmp_fh). I don't see how caching writes would help for cases like this though - read performance is also a major concern. > > (Note the absolute values are meaningless - this is just to compare > > one against another) > > > > So write_buf doesn't seem to help my program at all, while increasing > > the buffer size helps quite a bit due to the fewer write calls. > > > > I then ran callgrind on the direct_io version to see where the time is > > actually going. The total Ir for this test is 258M, with 142M (55%) > > going to fuse_lib_write_buf() and 49M (19%) going to fuse_lib_read(). > > Within fuse_lib_write_buf(), the major parts are: > > > > 67M: get_path_null() > > 17M: fuse_fs_write_buf() > > 22M: free_path() > > 27M: fuse_reply_write() > > > > It seems a lot of effort is going into getting/freeing the path for > > each 32k chunk of data written. In my case I don't care about the path > > at all for read & write, so this is unnecessary overhead for me. As a > > quick test I tried to comment out the calls to get_path_nullok() and > > free_path() in both fuse_lib_read() and fuse_lib_write_buf(), but this > > only shaved off ~100ms or so. > > > > Do you think it is feasible to be able to provide fuse with a "real" > > fd during open() so that read/write in the userspace side of things > > can be skipped entirely? I'd be happy to try to work up a patch, > > though any guidance would be appreciated :) > > It's feasable, and it's not even complicated, which makes it very > tempting. But it solves a special case only, and doesn't improve > anything else. It's not a generic solution. > I don't think it is too special a case - a number of people have asked about this in the past: http://thread.gmane.org/gmane.comp.file-systems.fuse.devel/9122/focus=9136 http://thread.gmane.org/gmane.comp.file-systems.fuse.devel/5946/focus=5947 As well as people being concerned with fusexmp performance: http://thread.gmane.org/gmane.comp.file-systems.fuse.devel/12222/focus=12224 http://thread.gmane.org/gmane.comp.file-systems.fuse.devel/10842/focus=10858 > > If we've done everything to improve the performance of the general case > and it's still not good enough, then I'm open to adding optimizations > for special cases like this. Until then you can help with implementing > or testing performance improvements. > > Again, if you have specific recommendations please let me know and I will re-run my linker test (and read/write tests) to compare. However, given the performance benefits, I really think you should consider adding support for read/write passthrough to FUSE. The patches I was testing with are attached. I don't know how to properly implement the kernel side though, so if there is a better way to do that please let me know and I can create a proper set of patches. Thanks, -Mike |
From: Sven U. <sve...@gm...> - 2013-03-31 19:36:43
|
Well, the idee certainly has my vote - I would guess that quite a few fuse fs would benefit from this, as it seems to be a fairly common case. Sven -- Diese Mail wurde von einem Handy gesendet, was Kargheit des Ausdrucks, Originalität der Rechtschreibung und lieblose Formatierung (inklusive TOFU) entschuldigen mag. -----Original Message----- From: Mike Shal <ma...@gm...> To: Miklos Szeredi <mi...@sz...> Cc: fus...@li... Sent: So, 31 Mrz 2013 20:46 Subject: Re: [fuse-devel] bypassing read/write for mirror fs Hello again, hope you don't mind revisiting this topic, but I have an example patch and some more benchmarks... On Thu, Sep 13, 2012 at 8:48 AM, Miklos Szeredi <mi...@sz...> wrote: > Mike Shal <ma...@gm...> writes: > > > Hello, > > > > I would like to avoid having the read() and write() calls go through > > my user fs, but rather go directly to the underlying filesystem that > > I'm mirroring. Essentially I want to let fuse know of the "real" file > > descriptor somehow in the open() call, and when the fuse kernel side > > gets a read() or write(), it will go back to the VFS layer with that > > file descriptor instead of going out to my process (which just does a > > pread() or pwrite() anyway). I would still like to get any other > > calls, such as unlink(), chown(), access(), etc. This has been > > discussed a while back: > > > http://thread.gmane.org/gmane.comp.file-systems.fuse.devel/5946/focus=5947 > > - has it been tried at all and is not actually workable? Or is there > > just not much interest? > > > > Some other similar threads I've found suggested a few other > > performance improvements for write(), such as using -obig_writes, > > using write_buf() instead of write(), and using direct_io. I tried > > these approaches all in the same test scenario with my program and > > came up with the following results: > > > > 4.200s: Default fuse fs (4k writes) with 582176 write() calls > > 4.252s: write_buf with 582176 write() calls > > 2.063s: direct_io (seems this enables 32k writes automatically?) with > > 72776 write() calls > > 2.510s: -obig_writes (32k writes) 72776 write() calls > > 0.419s: baseline without fuse at all (the closer to this, the better :) > > Even the big_writes number comes out at 900MB/s, which to me doesn't > sound bad at all. True, the baseline is six times better, but is that > really important? > Yes, I believe it is that important. As one example, I am trying to link a large number of files. In the native file-system, the linker runs in 18.986s. However, when I run the same link through fuse (using fusexmp_fh), it takes 47.232s (148% longer than native). With a patch to allow read/write passthrough, this goes down to 24.754s (30% longer than native). Here are a few other examples: 1) Large ~3GB read (cat bigfile.txt > /dev/null) native fs: 0.279s fuse: 1.392s (~5x slower) fuse passthrough: 0.279s (no difference!) 2) Large (100MB) write (dd bs=1M count=100 if=/dev/zero of=outfile) native fs: 0.048s fuse: 0.609s (~12x slower) fuse passthrough: 0.048s (no difference!) Note that in all cases, the speed of the underlying disk is irrelevant since everything is cached. I think this is significant enough to warrant adding the functionality to FUSE. > > And the performance of fuse can be improved further. For example Pavel > Emelyanov is working on a patchset that allows the kernel to cache > writes, just like any other filesystem, bringing the cached write > performance up to the baseline you measured. > I'd be happy to perform other tests if you can provide some details on how to run them (changes to fusexmp_fh). I don't see how caching writes would help for cases like this though - read performance is also a major concern. > > (Note the absolute values are meaningless - this is just to compare > > one against another) > > > > So write_buf doesn't seem to help my program at all, while increasing > > the buffer size helps quite a bit due to the fewer write calls. > > > > I then ran callgrind on the direct_io version to see where the time is > > actually going. The total Ir for this test is 258M, with 142M (55%) > > going to fuse_lib_write_buf() and 49M (19%) going to fuse_lib_read(). > > Within fuse_lib_write_buf(), the major parts are: > > > > 67M: get_path_null() > > 17M: fuse_fs_write_buf() > > 22M: free_path() > > 27M: fuse_reply_write() > > > > It seems a lot of effort is going into getting/freeing the path for > > each 32k chunk of data written. In my case I don't care about the path > > at all for read & write, so this is unnecessary overhead for me. As a > > quick test I tried to comment out the calls to get_path_nullok() and > > free_path() in both fuse_lib_read() and fuse_lib_write_buf(), but this > > only shaved off ~100ms or so. > > > > Do you think it is feasible to be able to provide fuse with a "real" > > fd during open() so that read/write in the userspace side of things > > can be skipped entirely? I'd be happy to try to work up a patch, > > though any guidance would be appreciated :) > > It's feasable, and it's not even complicated, which makes it very > tempting. But it solves a special case only, and doesn't improve > anything else. It's not a generic solution. > I don't think it is too special a case - a number of people have asked about this in the past: http://thread.gmane.org/gmane.comp.file-systems.fuse.devel/9122/focus=9136 http://thread.gmane.org/gmane.comp.file-systems.fuse.devel/5946/focus=5947 As well as people being concerned with fusexmp performance: http://thread.gmane.org/gmane.comp.file-systems.fuse.devel/12222/focus=12224 http://thread.gmane.org/gmane.comp.file-systems.fuse.devel/10842/focus=10858 > > If we've done everything to improve the performance of the general case > and it's still not good enough, then I'm open to adding optimizations > for special cases like this. Until then you can help with implementing > or testing performance improvements. > > Again, if you have specific recommendations please let me know and I will re-run my linker test (and read/write tests) to compare. However, given the performance benefits, I really think you should consider adding support for read/write passthrough to FUSE. The patches I was testing with are attached. I don't know how to properly implement the kernel side though, so if there is a better way to do that please let me know and I can create a proper set of patches. Thanks, -Mike |
From: Fox, K. M <kev...@pn...> - 2013-04-01 15:32:05
|
Me too. I'd like to use it in a super computing application but am concerned latency would be too high. Earlier in the post it was mentioned that fuse was able to push 900MB/s bandwidth, but its the added latency that can also kill performance. As far as I know, no one has ever really tested it though. Mike's post implies that it is actually an issue. Thanks, Kevin ________________________________________ From: Sven Utcke [sve...@gm...] Sent: Sunday, March 31, 2013 12:36 PM To: fus...@li... Subject: Re: [fuse-devel] bypassing read/write for mirror fs Well, the idee certainly has my vote - I would guess that quite a few fuse fs would benefit from this, as it seems to be a fairly common case. Sven -- Diese Mail wurde von einem Handy gesendet, was Kargheit des Ausdrucks, Originalität der Rechtschreibung und lieblose Formatierung (inklusive TOFU) entschuldigen mag. -----Original Message----- From: Mike Shal <ma...@gm...> To: Miklos Szeredi <mi...@sz...> Cc: fus...@li... Sent: So, 31 Mrz 2013 20:46 Subject: Re: [fuse-devel] bypassing read/write for mirror fs Hello again, hope you don't mind revisiting this topic, but I have an example patch and some more benchmarks... On Thu, Sep 13, 2012 at 8:48 AM, Miklos Szeredi <mi...@sz...> wrote: > Mike Shal <ma...@gm...> writes: > > > Hello, > > > > I would like to avoid having the read() and write() calls go through > > my user fs, but rather go directly to the underlying filesystem that > > I'm mirroring. Essentially I want to let fuse know of the "real" file > > descriptor somehow in the open() call, and when the fuse kernel side > > gets a read() or write(), it will go back to the VFS layer with that > > file descriptor instead of going out to my process (which just does a > > pread() or pwrite() anyway). I would still like to get any other > > calls, such as unlink(), chown(), access(), etc. This has been > > discussed a while back: > > > http://thread.gmane.org/gmane.comp.file-systems.fuse.devel/5946/focus=5947 > > - has it been tried at all and is not actually workable? Or is there > > just not much interest? > > > > Some other similar threads I've found suggested a few other > > performance improvements for write(), such as using -obig_writes, > > using write_buf() instead of write(), and using direct_io. I tried > > these approaches all in the same test scenario with my program and > > came up with the following results: > > > > 4.200s: Default fuse fs (4k writes) with 582176 write() calls > > 4.252s: write_buf with 582176 write() calls > > 2.063s: direct_io (seems this enables 32k writes automatically?) with > > 72776 write() calls > > 2.510s: -obig_writes (32k writes) 72776 write() calls > > 0.419s: baseline without fuse at all (the closer to this, the better :) > > Even the big_writes number comes out at 900MB/s, which to me doesn't > sound bad at all. True, the baseline is six times better, but is that > really important? > Yes, I believe it is that important. As one example, I am trying to link a large number of files. In the native file-system, the linker runs in 18.986s. However, when I run the same link through fuse (using fusexmp_fh), it takes 47.232s (148% longer than native). With a patch to allow read/write passthrough, this goes down to 24.754s (30% longer than native). Here are a few other examples: 1) Large ~3GB read (cat bigfile.txt > /dev/null) native fs: 0.279s fuse: 1.392s (~5x slower) fuse passthrough: 0.279s (no difference!) 2) Large (100MB) write (dd bs=1M count=100 if=/dev/zero of=outfile) native fs: 0.048s fuse: 0.609s (~12x slower) fuse passthrough: 0.048s (no difference!) Note that in all cases, the speed of the underlying disk is irrelevant since everything is cached. I think this is significant enough to warrant adding the functionality to FUSE. > > And the performance of fuse can be improved further. For example Pavel > Emelyanov is working on a patchset that allows the kernel to cache > writes, just like any other filesystem, bringing the cached write > performance up to the baseline you measured. > I'd be happy to perform other tests if you can provide some details on how to run them (changes to fusexmp_fh). I don't see how caching writes would help for cases like this though - read performance is also a major concern. > > (Note the absolute values are meaningless - this is just to compare > > one against another) > > > > So write_buf doesn't seem to help my program at all, while increasing > > the buffer size helps quite a bit due to the fewer write calls. > > > > I then ran callgrind on the direct_io version to see where the time is > > actually going. The total Ir for this test is 258M, with 142M (55%) > > going to fuse_lib_write_buf() and 49M (19%) going to fuse_lib_read(). > > Within fuse_lib_write_buf(), the major parts are: > > > > 67M: get_path_null() > > 17M: fuse_fs_write_buf() > > 22M: free_path() > > 27M: fuse_reply_write() > > > > It seems a lot of effort is going into getting/freeing the path for > > each 32k chunk of data written. In my case I don't care about the path > > at all for read & write, so this is unnecessary overhead for me. As a > > quick test I tried to comment out the calls to get_path_nullok() and > > free_path() in both fuse_lib_read() and fuse_lib_write_buf(), but this > > only shaved off ~100ms or so. > > > > Do you think it is feasible to be able to provide fuse with a "real" > > fd during open() so that read/write in the userspace side of things > > can be skipped entirely? I'd be happy to try to work up a patch, > > though any guidance would be appreciated :) > > It's feasable, and it's not even complicated, which makes it very > tempting. But it solves a special case only, and doesn't improve > anything else. It's not a generic solution. > I don't think it is too special a case - a number of people have asked about this in the past: http://thread.gmane.org/gmane.comp.file-systems.fuse.devel/9122/focus=9136 http://thread.gmane.org/gmane.comp.file-systems.fuse.devel/5946/focus=5947 As well as people being concerned with fusexmp performance: http://thread.gmane.org/gmane.comp.file-systems.fuse.devel/12222/focus=12224 http://thread.gmane.org/gmane.comp.file-systems.fuse.devel/10842/focus=10858 > > If we've done everything to improve the performance of the general case > and it's still not good enough, then I'm open to adding optimizations > for special cases like this. Until then you can help with implementing > or testing performance improvements. > > Again, if you have specific recommendations please let me know and I will re-run my linker test (and read/write tests) to compare. However, given the performance benefits, I really think you should consider adding support for read/write passthrough to FUSE. The patches I was testing with are attached. I don't know how to properly implement the kernel side though, so if there is a better way to do that please let me know and I can create a proper set of patches. Thanks, -Mike ------------------------------------------------------------------------------ Own the Future-Intel(R) Level Up Game Demo Contest 2013 Rise to greatness in Intel's independent game demo contest. Compete for recognition, cash, and the chance to get your game on Steam. $5K grand prize plus 10 genre and skill prizes. Submit your demo by 6/6/13. http://altfarm.mediaplex.com/ad/ck/12124-176961-30367-2 _______________________________________________ fuse-devel mailing list fus...@li... https://lists.sourceforge.net/lists/listinfo/fuse-devel |
From: Nikolaus R. <Nik...@ra...> - 2013-04-02 03:25:02
|
Mike Shal <mar...@pu...> writes: >> > I then ran callgrind on the direct_io version to see where the time is >> > actually going. The total Ir for this test is 258M, with 142M (55%) >> > going to fuse_lib_write_buf() and 49M (19%) going to fuse_lib_read(). >> > Within fuse_lib_write_buf(), the major parts are: >> > >> > 67M: get_path_null() >> > 17M: fuse_fs_write_buf() >> > 22M: free_path() >> > 27M: fuse_reply_write() >> > >> > It seems a lot of effort is going into getting/freeing the path for >> > each 32k chunk of data written. In my case I don't care about the path >> > at all for read & write, so this is unnecessary overhead for me. As a >> > quick test I tried to comment out the calls to get_path_nullok() and >> > free_path() in both fuse_lib_read() and fuse_lib_write_buf(), but this >> > only shaved off ~100ms or so. >> > >> > Do you think it is feasible to be able to provide fuse with a "real" >> > fd during open() so that read/write in the userspace side of things >> > can be skipped entirely? I'd be happy to try to work up a patch, >> > though any guidance would be appreciated :) Have you tried using the low-level API as well? Maybe that allows you to reap a big amount of the same benefits at a fraction of the complexity. Best, -Nikolaus -- »Time flies like an arrow, fruit flies like a Banana.« PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6 02CF A9AD B7F8 AE4E 425C |
From: Mike S. <ma...@gm...> - 2013-04-02 14:46:12
|
Hi Nikolaus, On Mon, Apr 1, 2013 at 11:24 PM, Nikolaus Rath <Nik...@ra...> wrote: > Mike Shal <mar...@pu...> writes: > >> > I then ran callgrind on the direct_io version to see where the time is > >> > actually going. The total Ir for this test is 258M, with 142M (55%) > >> > going to fuse_lib_write_buf() and 49M (19%) going to fuse_lib_read(). > >> > Within fuse_lib_write_buf(), the major parts are: > >> > > >> > 67M: get_path_null() > >> > 17M: fuse_fs_write_buf() > >> > 22M: free_path() > >> > 27M: fuse_reply_write() > >> > > >> > It seems a lot of effort is going into getting/freeing the path for > >> > each 32k chunk of data written. In my case I don't care about the path > >> > at all for read & write, so this is unnecessary overhead for me. As a > >> > quick test I tried to comment out the calls to get_path_nullok() and > >> > free_path() in both fuse_lib_read() and fuse_lib_write_buf(), but this > >> > only shaved off ~100ms or so. > >> > > >> > Do you think it is feasible to be able to provide fuse with a "real" > >> > fd during open() so that read/write in the userspace side of things > >> > can be skipped entirely? I'd be happy to try to work up a patch, > >> > though any guidance would be appreciated :) > > Have you tried using the low-level API as well? Maybe that allows you to > reap a big amount of the same benefits at a fraction of the complexity. > > I have not tried the low-level API. Do you have some details as for why you think that would help performance? Or, do you know of a loopback-style filesystem (similar to fusexmp) implemented with the low-level interface that I could try out to get some numbers? The only example I've found is hello_ll.c, which is not a loopback fs. Also, it looks like all reads and writes will still go through the low-level interface, so I don't see how that would help in this case. Thanks, -Mike |
From: Fox, K. M <kev...@pn...> - 2013-04-02 15:41:40
|
You should be able to set .flag_nullpath_ok = 1 on the struct fuse_operations of your file system. It should let you avoid needing to go to the low level api. Thanks, Kevin ________________________________________ From: Nikolaus Rath [Nik...@ra...] Sent: Monday, April 01, 2013 8:24 PM To: fus...@li... Subject: Re: [fuse-devel] bypassing read/write for mirror fs Mike Shal <mar...@pu...> writes: >> > I then ran callgrind on the direct_io version to see where the time is >> > actually going. The total Ir for this test is 258M, with 142M (55%) >> > going to fuse_lib_write_buf() and 49M (19%) going to fuse_lib_read(). >> > Within fuse_lib_write_buf(), the major parts are: >> > >> > 67M: get_path_null() >> > 17M: fuse_fs_write_buf() >> > 22M: free_path() >> > 27M: fuse_reply_write() >> > >> > It seems a lot of effort is going into getting/freeing the path for >> > each 32k chunk of data written. In my case I don't care about the path >> > at all for read & write, so this is unnecessary overhead for me. As a >> > quick test I tried to comment out the calls to get_path_nullok() and >> > free_path() in both fuse_lib_read() and fuse_lib_write_buf(), but this >> > only shaved off ~100ms or so. >> > >> > Do you think it is feasible to be able to provide fuse with a "real" >> > fd during open() so that read/write in the userspace side of things >> > can be skipped entirely? I'd be happy to try to work up a patch, >> > though any guidance would be appreciated :) Have you tried using the low-level API as well? Maybe that allows you to reap a big amount of the same benefits at a fraction of the complexity. Best, -Nikolaus -- »Time flies like an arrow, fruit flies like a Banana.« PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6 02CF A9AD B7F8 AE4E 425C ------------------------------------------------------------------------------ Own the Future-Intel(R) Level Up Game Demo Contest 2013 Rise to greatness in Intel's independent game demo contest. Compete for recognition, cash, and the chance to get your game on Steam. $5K grand prize plus 10 genre and skill prizes. Submit your demo by 6/6/13. http://altfarm.mediaplex.com/ad/ck/12124-176961-30367-2 _______________________________________________ fuse-devel mailing list fus...@li... https://lists.sourceforge.net/lists/listinfo/fuse-devel |
From: Sven U. <sve...@gm...> - 2013-04-02 08:58:48
|
> Have you tried using the low-level API as well? Maybe that allows > you to reap a big amount of the same benefits at a fraction of the > complexity. Not sure about possible benefits (why would this be faster?), but as far as complexity is concerned: many of the cases where such a pass-through functionality come in handy explicitly deal with filenames, so I suspect this might get quite a bit more complicated - and not only once, in the library, but in each individual application program. Doesn't sound too good to me Sven -- _ ___ ___ ___ The dCache File System __| |/ __|| __|/ __| An archive file-system for PB of data / _` | (__ | _| \__ \ http://www.desy.de/~utcke/Data/ \__,_|\___||_| |___/ http://www.dr-utcke.de/ |
From: Nikolaus R. <Nik...@ra...> - 2013-04-03 03:18:34
|
Sven Utcke <sve...@pu...> writes: >> Have you tried using the low-level API as well? Maybe that allows >> you to reap a big amount of the same benefits at a fraction of the >> complexity. > > Not sure about possible benefits (why would this be faster?) It would allow to splice the data to/from the fuse device, and maybe save overhead on pathname lookup (I'm not sure how much of this can be achieved with path_nullok option these days). > but as far as complexity is concerned: many of the cases where such a > pass-through functionality come in handy explicitly deal with > filenames, so I suspect this might get quite a bit more complicated - > and not only once, in the library, but in each individual application > program. Well, if it turns out that the bottleneck is in the high-level API (note that I'm not claiming that, I'm just suggesting the possibility) one could most likely just extend the high-level API appropriately. Best, -Nikolaus -- »Time flies like an arrow, fruit flies like a Banana.« PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6 02CF A9AD B7F8 AE4E 425C |
From: Goswin v. B. <gos...@we...> - 2013-04-02 13:41:21
|
On Sun, Mar 31, 2013 at 02:44:10PM -0400, Mike Shal wrote: > Hello again, hope you don't mind revisiting this topic, but I have an > example patch and some more benchmarks... > > Here are a few other examples: > > 1) Large ~3GB read (cat bigfile.txt > /dev/null) > native fs: 0.279s > fuse: 1.392s (~5x slower) > fuse passthrough: 0.279s (no difference!) > > 2) Large (100MB) write (dd bs=1M count=100 if=/dev/zero of=outfile) > native fs: 0.048s > fuse: 0.609s (~12x slower) > fuse passthrough: 0.048s (no difference!) > > Note that in all cases, the speed of the underlying disk is irrelevant > since everything is cached. > > I think this is significant enough to warrant adding the functionality to > FUSE. > > > > > > And the performance of fuse can be improved further. For example Pavel > > Emelyanov is working on a patchset that allows the kernel to cache > > writes, just like any other filesystem, bringing the cached write > > performance up to the baseline you measured. > > > > I'd be happy to perform other tests if you can provide some details on how > to run them (changes to fusexmp_fh). I don't see how caching writes would > help for cases like this though - read performance is also a major concern. So how much faster does fuse get with big writes (and I mean 128k or more here) and with splice operations for the same tests? MfG Goswin |
From: Fox, K. M <kev...@pn...> - 2013-04-02 15:37:33
|
Big write support only buy you so much. If all your applications use smaller writes, and they work on the underlying fs ok, it will be hard to convince all the software writers to change their code. Thanks, Kevin ________________________________________ From: Goswin von Brederlow [gos...@we...] Sent: Tuesday, April 02, 2013 6:41 AM To: fus...@li... Subject: Re: [fuse-devel] bypassing read/write for mirror fs On Sun, Mar 31, 2013 at 02:44:10PM -0400, Mike Shal wrote: > Hello again, hope you don't mind revisiting this topic, but I have an > example patch and some more benchmarks... > > Here are a few other examples: > > 1) Large ~3GB read (cat bigfile.txt > /dev/null) > native fs: 0.279s > fuse: 1.392s (~5x slower) > fuse passthrough: 0.279s (no difference!) > > 2) Large (100MB) write (dd bs=1M count=100 if=/dev/zero of=outfile) > native fs: 0.048s > fuse: 0.609s (~12x slower) > fuse passthrough: 0.048s (no difference!) > > Note that in all cases, the speed of the underlying disk is irrelevant > since everything is cached. > > I think this is significant enough to warrant adding the functionality to > FUSE. > > > > > > And the performance of fuse can be improved further. For example Pavel > > Emelyanov is working on a patchset that allows the kernel to cache > > writes, just like any other filesystem, bringing the cached write > > performance up to the baseline you measured. > > > > I'd be happy to perform other tests if you can provide some details on how > to run them (changes to fusexmp_fh). I don't see how caching writes would > help for cases like this though - read performance is also a major concern. So how much faster does fuse get with big writes (and I mean 128k or more here) and with splice operations for the same tests? MfG Goswin ------------------------------------------------------------------------------ Own the Future-Intel(R) Level Up Game Demo Contest 2013 Rise to greatness in Intel's independent game demo contest. Compete for recognition, cash, and the chance to get your game on Steam. $5K grand prize plus 10 genre and skill prizes. Submit your demo by 6/6/13. http://altfarm.mediaplex.com/ad/ck/12124-176961-30367-2 _______________________________________________ fuse-devel mailing list fus...@li... https://lists.sourceforge.net/lists/listinfo/fuse-devel |
From: Mike S. <ma...@gm...> - 2013-04-02 14:58:40
|
Hi Goswin, On Tue, Apr 2, 2013 at 9:41 AM, Goswin von Brederlow <gos...@we...>wrote: > On Sun, Mar 31, 2013 at 02:44:10PM -0400, Mike Shal wrote: > > Hello again, hope you don't mind revisiting this topic, but I have an > > example patch and some more benchmarks... > > > > Here are a few other examples: > > > > 1) Large ~3GB read (cat bigfile.txt > /dev/null) > > native fs: 0.279s > > fuse: 1.392s (~5x slower) > > fuse passthrough: 0.279s (no difference!) > > > > 2) Large (100MB) write (dd bs=1M count=100 if=/dev/zero of=outfile) > > native fs: 0.048s > > fuse: 0.609s (~12x slower) > > fuse passthrough: 0.048s (no difference!) > > > > Note that in all cases, the speed of the underlying disk is irrelevant > > since everything is cached. > > > > I think this is significant enough to warrant adding the functionality to > > FUSE. > > > > > > > > > > And the performance of fuse can be improved further. For example Pavel > > > Emelyanov is working on a patchset that allows the kernel to cache > > > writes, just like any other filesystem, bringing the cached write > > > performance up to the baseline you measured. > > > > > > > I'd be happy to perform other tests if you can provide some details on > how > > to run them (changes to fusexmp_fh). I don't see how caching writes would > > help for cases like this though - read performance is also a major > concern. > > So how much faster does fuse get with big writes (and I mean 128k or > more here) and with splice operations for the same tests? > > Here are my results: A) ./fusexmp_fh -obig_writes 1) link test: 45.149s (~2 second improvement, still 137% longer than native) 2) read test: no change 3) write test: 0.173s (now 3.5x slower, rather than 12x slower) So it seems for the case I really care about (the end-to-end linking time), writing is a small portion of the total time. However, it does speed up the write-only test significantly using a 128k buffer instead of the default 4k buffer. It is still 3.5x slower, whereas with the passthrough implementation it achieves native speeds. B) ./fusexmp_fh -osplice_write -osplice_read 1) link test: 47.339s (no real change over the default fuse) 2) read test: 0.656s (twice as fast as default fuse, but still twice as slow as native) 3) write test: 0.545s (slightly better than default fuse, but still 11x slower than native) I also tried with -osplice_move, but for some reason that makes all reads pull from the disk rather than the cache. This makes the link test and read test pretty abysmal: C) ./fusexmp_fh -osplice_move -osplice_write -osplice_read 1) link test: 1m0.154s 2) read test: 7.536s I don't really know what's going on there, though (maybe I'm using it wrong?) In all, it seems these options help a little bit, but nowhere near as much as a passthrough implementation. Any other thoughts / suggestions to try? Thanks, -Mike |
From: Goswin v. B. <gos...@we...> - 2013-04-09 11:22:51
|
On Tue, Apr 02, 2013 at 10:58:32AM -0400, Mike Shal wrote: > Hi Goswin, > > On Tue, Apr 2, 2013 at 9:41 AM, Goswin von Brederlow <gos...@we...>wrote: > > > On Sun, Mar 31, 2013 at 02:44:10PM -0400, Mike Shal wrote: > > > Hello again, hope you don't mind revisiting this topic, but I have an > > > example patch and some more benchmarks... > > > > > > Here are a few other examples: > > > > > > 1) Large ~3GB read (cat bigfile.txt > /dev/null) > > > native fs: 0.279s > > > fuse: 1.392s (~5x slower) > > > fuse passthrough: 0.279s (no difference!) > > > > > > 2) Large (100MB) write (dd bs=1M count=100 if=/dev/zero of=outfile) > > > native fs: 0.048s > > > fuse: 0.609s (~12x slower) > > > fuse passthrough: 0.048s (no difference!) > > > > > > Note that in all cases, the speed of the underlying disk is irrelevant > > > since everything is cached. > > > > > > I think this is significant enough to warrant adding the functionality to > > > FUSE. > > > > > > > > > > > > > > And the performance of fuse can be improved further. For example Pavel > > > > Emelyanov is working on a patchset that allows the kernel to cache > > > > writes, just like any other filesystem, bringing the cached write > > > > performance up to the baseline you measured. > > > > > > > > > > I'd be happy to perform other tests if you can provide some details on > > how > > > to run them (changes to fusexmp_fh). I don't see how caching writes would > > > help for cases like this though - read performance is also a major > > concern. > > > > So how much faster does fuse get with big writes (and I mean 128k or > > more here) and with splice operations for the same tests? > > > > > Here are my results: > > A) ./fusexmp_fh -obig_writes > 1) link test: 45.149s (~2 second improvement, still 137% longer than native) > 2) read test: no change > 3) write test: 0.173s (now 3.5x slower, rather than 12x slower) > > So it seems for the case I really care about (the end-to-end linking time), > writing is a small portion of the total time. However, it does speed up the > write-only test significantly using a 128k buffer instead of the default 4k > buffer. It is still 3.5x slower, whereas with the passthrough > implementation it achieves native speeds. Obviously -obig_writes only affects big writes and not links (or reads). No surprise there. But you can see that 128k buffers help a lot. Even bigger buffer help even more. > B) ./fusexmp_fh -osplice_write -osplice_read > 1) link test: 47.339s (no real change over the default fuse) > 2) read test: 0.656s (twice as fast as default fuse, but still twice as > slow as native) > 3) write test: 0.545s (slightly better than default fuse, but still 11x > slower than native) And ./fusexmp_fh -osplice_write -osplice_read -obig_writes? > I also tried with -osplice_move, but for some reason that makes all reads > pull from the disk rather than the cache. This makes the link test and read > test pretty abysmal: > > C) ./fusexmp_fh -osplice_move -osplice_write -osplice_read > 1) link test: 1m0.154s > 2) read test: 7.536s > > I don't really know what's going on there, though (maybe I'm using it > wrong?) That sounds like it is disabling caching in some unexpected way. > In all, it seems these options help a little bit, but nowhere near as much > as a passthrough implementation. > > Any other thoughts / suggestions to try? > > Thanks, > -Mike Task switching to fuse and back will always be an overhead and passthrough will always be a bit faster. What surprises me is that it still is that much overhead. How large are the read requests? Maybe those can be tuned more? Bigger read-ahead or larger requests? For writes wasn't there recently a patch to improve caching and page writeback for fuse? Combined with larger (even larger than 128k) writes fuse should get nearer to the passthrough performance. MfG Goswin |
From: Mike S. <ma...@gm...> - 2013-04-09 15:21:21
|
Hi Goswin, thanks for the feedback. My results are below: On Tue, Apr 9, 2013 at 7:22 AM, Goswin von Brederlow <gos...@we...>wrote: > On Tue, Apr 02, 2013 at 10:58:32AM -0400, Mike Shal wrote: > > Hi Goswin, > > > > On Tue, Apr 2, 2013 at 9:41 AM, Goswin von Brederlow <gos...@we... > >wrote: > > > > > On Sun, Mar 31, 2013 at 02:44:10PM -0400, Mike Shal wrote: > > > > Hello again, hope you don't mind revisiting this topic, but I have an > > > > example patch and some more benchmarks... > > > > > > > > Here are a few other examples: > > > > > > > > 1) Large ~3GB read (cat bigfile.txt > /dev/null) > > > > native fs: 0.279s > > > > fuse: 1.392s (~5x slower) > > > > fuse passthrough: 0.279s (no difference!) > > > > > > > > 2) Large (100MB) write (dd bs=1M count=100 if=/dev/zero of=outfile) > > > > native fs: 0.048s > > > > fuse: 0.609s (~12x slower) > > > > fuse passthrough: 0.048s (no difference!) > > > > > > > > Note that in all cases, the speed of the underlying disk is > irrelevant > > > > since everything is cached. > > > > > > > > I think this is significant enough to warrant adding the > functionality to > > > > FUSE. > > > > > > > > > > > > > > > > > > And the performance of fuse can be improved further. For example > Pavel > > > > > Emelyanov is working on a patchset that allows the kernel to cache > > > > > writes, just like any other filesystem, bringing the cached write > > > > > performance up to the baseline you measured. > > > > > > > > > > > > > I'd be happy to perform other tests if you can provide some details > on > > > how > > > > to run them (changes to fusexmp_fh). I don't see how caching writes > would > > > > help for cases like this though - read performance is also a major > > > concern. > > > > > > So how much faster does fuse get with big writes (and I mean 128k or > > > more here) and with splice operations for the same tests? > > > > > > > > Here are my results: > > > > A) ./fusexmp_fh -obig_writes > > 1) link test: 45.149s (~2 second improvement, still 137% longer than > native) > > 2) read test: no change > > 3) write test: 0.173s (now 3.5x slower, rather than 12x slower) > > > > So it seems for the case I really care about (the end-to-end linking > time), > > writing is a small portion of the total time. However, it does speed up > the > > write-only test significantly using a 128k buffer instead of the default > 4k > > buffer. It is still 3.5x slower, whereas with the passthrough > > implementation it achieves native speeds. > > Obviously -obig_writes only affects big writes and not links (or > reads). No surprise there. > > But you can see that 128k buffers help a lot. Even bigger buffer help > even more. > The 128k buffer only helps a little bit for the link time, which is the case I care about the most. It helps more for the write-only case, but that is a simple benchmark, not a real-world test case. > > > B) ./fusexmp_fh -osplice_write -osplice_read > > 1) link test: 47.339s (no real change over the default fuse) > > 2) read test: 0.656s (twice as fast as default fuse, but still twice as > > slow as native) > > 3) write test: 0.545s (slightly better than default fuse, but still 11x > > slower than native) > > And ./fusexmp_fh -osplice_write -osplice_read -obig_writes? > With -osplice_write -osplice_read -obig_writes I get: 1) link test: 45.622s 2) read test: 0.700s 3) write test: 0.155s Task switching to fuse and back will always be an overhead and > passthrough will always be a bit faster. What surprises me is that it > still is that much overhead. > > How large are the read requests? Maybe those can be tuned more? Bigger > read-ahead or larger requests? > For the link test, read sizes (as measured by printing out the 'size' argument in read_buf()) are anywhere from 4k to 128k. Maybe the variation is because of how the linker is reading the data - it probably doesn't read the whole file in at once, but seeks around and reads the parts it needs. Just a guess, though. For reference, there are 124072 calls to read_buf() and 225079 calls to write_buf() in the link test (measured using the fusexmp_fh -osplice_write -osplice_read -obig_writes). Making these numbers smaller by using different buffer sizes may help somewhat, as shown by the small improvement using -obig_writes. However, with a passthrough implementation, these numbers are 0. > > For writes wasn't there recently a patch to improve caching and page > writeback for fuse? Combined with larger (even larger than 128k) > writes fuse should get nearer to the passthrough performance. > > This would not do anything for read performance though, correct? In my link test, I can temporarily ignore the write side of the problem by specifying /dev/null as the output library. In this case, there are only 52 calls to write_buf() (there is a temporary file written listing the object files), so we can see how much just using passthrough on read() requests will help. Here are my numbers: native: 16.807s default fuse: 27.471s splice_read/write and big_writes: 27.059s fuse passthrough: 22.597s Here is a summary of the benchmarks so far for the link test (my real-world use case) from best to worst: native: 18.986s passthrough: 24.754s -obig_writes: 45.149s -osplice_write -osplice_read -obig_writes: 45.622s fusexmp_fh defaults: 47.232 -osplice_write -osplice_read: 47.339s -Mike |
From: Feng S. <ste...@gm...> - 2013-04-17 03:55:18
|
Hi Mike, I reviewed the two patches. Putting aside some kernel implementation issues (locking, reference count etc. ), it uses fuse_fh to pass the fd, which should be opaque to fuse kernel. Also, the maintenance of fd (open/close) will be really complicate in a real file system implementation (not the fusexmp_fh:-)...... Anyway this patch is a very cool adventure. We just need to think more about it. On Tue, Apr 9, 2013 at 11:21 PM, Mike Shal <ma...@gm...> wrote: > Hi Goswin, thanks for the feedback. My results are below: > > On Tue, Apr 9, 2013 at 7:22 AM, Goswin von Brederlow <gos...@we... > >wrote: > > > On Tue, Apr 02, 2013 at 10:58:32AM -0400, Mike Shal wrote: > > > Hi Goswin, > > > > > > On Tue, Apr 2, 2013 at 9:41 AM, Goswin von Brederlow < > gos...@we... > > >wrote: > > > > > > > On Sun, Mar 31, 2013 at 02:44:10PM -0400, Mike Shal wrote: > > > > > Hello again, hope you don't mind revisiting this topic, but I have > an > > > > > example patch and some more benchmarks... > > > > > > > > > > Here are a few other examples: > > > > > > > > > > 1) Large ~3GB read (cat bigfile.txt > /dev/null) > > > > > native fs: 0.279s > > > > > fuse: 1.392s (~5x slower) > > > > > fuse passthrough: 0.279s (no difference!) > > > > > > > > > > 2) Large (100MB) write (dd bs=1M count=100 if=/dev/zero of=outfile) > > > > > native fs: 0.048s > > > > > fuse: 0.609s (~12x slower) > > > > > fuse passthrough: 0.048s (no difference!) > > > > > > > > > > Note that in all cases, the speed of the underlying disk is > > irrelevant > > > > > since everything is cached. > > > > > > > > > > I think this is significant enough to warrant adding the > > functionality to > > > > > FUSE. > > > > > > > > > > > > > > > > > > > > > > And the performance of fuse can be improved further. For example > > Pavel > > > > > > Emelyanov is working on a patchset that allows the kernel to > cache > > > > > > writes, just like any other filesystem, bringing the cached write > > > > > > performance up to the baseline you measured. > > > > > > > > > > > > > > > > I'd be happy to perform other tests if you can provide some details > > on > > > > how > > > > > to run them (changes to fusexmp_fh). I don't see how caching writes > > would > > > > > help for cases like this though - read performance is also a major > > > > concern. > > > > > > > > So how much faster does fuse get with big writes (and I mean 128k or > > > > more here) and with splice operations for the same tests? > > > > > > > > > > > Here are my results: > > > > > > A) ./fusexmp_fh -obig_writes > > > 1) link test: 45.149s (~2 second improvement, still 137% longer than > > native) > > > 2) read test: no change > > > 3) write test: 0.173s (now 3.5x slower, rather than 12x slower) > > > > > > So it seems for the case I really care about (the end-to-end linking > > time), > > > writing is a small portion of the total time. However, it does speed up > > the > > > write-only test significantly using a 128k buffer instead of the > default > > 4k > > > buffer. It is still 3.5x slower, whereas with the passthrough > > > implementation it achieves native speeds. > > > > Obviously -obig_writes only affects big writes and not links (or > > reads). No surprise there. > > > > But you can see that 128k buffers help a lot. Even bigger buffer help > > even more. > > > > The 128k buffer only helps a little bit for the link time, which is the > case I care about the most. It helps more for the write-only case, but that > is a simple benchmark, not a real-world test case. > > > > > > > B) ./fusexmp_fh -osplice_write -osplice_read > > > 1) link test: 47.339s (no real change over the default fuse) > > > 2) read test: 0.656s (twice as fast as default fuse, but still twice as > > > slow as native) > > > 3) write test: 0.545s (slightly better than default fuse, but still 11x > > > slower than native) > > > > And ./fusexmp_fh -osplice_write -osplice_read -obig_writes? > > > > With -osplice_write -osplice_read -obig_writes I get: > > 1) link test: 45.622s > 2) read test: 0.700s > 3) write test: 0.155s > > Task switching to fuse and back will always be an overhead and > > passthrough will always be a bit faster. What surprises me is that it > > still is that much overhead. > > > > How large are the read requests? Maybe those can be tuned more? Bigger > > read-ahead or larger requests? > > > > For the link test, read sizes (as measured by printing out the 'size' > argument in read_buf()) are anywhere from 4k to 128k. Maybe the variation > is because of how the linker is reading the data - it probably doesn't read > the whole file in at once, but seeks around and reads the parts it needs. > Just a guess, though. > > For reference, there are 124072 calls to read_buf() and 225079 calls to > write_buf() in the link test (measured using the fusexmp_fh -osplice_write > -osplice_read -obig_writes). Making these numbers smaller by using > different buffer sizes may help somewhat, as shown by the small improvement > using -obig_writes. However, with a passthrough implementation, these > numbers are 0. > > > > > > For writes wasn't there recently a patch to improve caching and page > > writeback for fuse? Combined with larger (even larger than 128k) > > writes fuse should get nearer to the passthrough performance. > > > > > This would not do anything for read performance though, correct? In my link > test, I can temporarily ignore the write side of the problem by specifying > /dev/null as the output library. In this case, there are only 52 calls to > write_buf() (there is a temporary file written listing the object files), > so we can see how much just using passthrough on read() requests will help. > Here are my numbers: > > native: 16.807s > default fuse: 27.471s > splice_read/write and big_writes: 27.059s > fuse passthrough: 22.597s > > Here is a summary of the benchmarks so far for the link test (my real-world > use case) from best to worst: > > native: 18.986s > passthrough: 24.754s > -obig_writes: 45.149s > -osplice_write -osplice_read -obig_writes: 45.622s > fusexmp_fh defaults: 47.232 > -osplice_write -osplice_read: 47.339s > > -Mike > > ------------------------------------------------------------------------------ > Precog is a next-generation analytics platform capable of advanced > analytics on semi-structured data. The platform includes APIs for building > apps and a phenomenal toolset for data science. Developers can use > our toolset for easy data analysis & visualization. Get a free account! > http://www2.precog.com/precogplatform/slashdotnewsletter > _______________________________________________ > fuse-devel mailing list > fus...@li... > https://lists.sourceforge.net/lists/listinfo/fuse-devel > -- Feng Shuo Tel: (86)10-59851155-2116 Fax: (86)10-59851155-2008 Tianjin Zhongke Blue Whale Information Technologies Co., Ltd 10th Floor, Tower A, The GATE building, No. 19 Zhong-guan-cun Avenue Haidian District, Beijing, China Postcode 100080 |
From: Mike S. <ma...@gm...> - 2013-04-18 15:09:17
|
Thanks for taking the time to review. On Tue, Apr 16, 2013 at 11:54 PM, Feng Shuo <ste...@gm...>wrote: > Hi Mike, > > I reviewed the two patches. Putting aside some kernel implementation > issues (locking, reference count etc. ), it uses fuse_fh to pass the fd, > which should be opaque to fuse kernel. Also, the maintenance of fd > (open/close) will be really complicate in a real file system implementation > (not the fusexmp_fh:-)...... Anyway this patch is a very cool adventure. We > just need to think more about it. > Yeah, as I mentioned I don't really know the correct way to do this (particularly the kernel side). I'd be happy to work on a proper patch, but I could use some guidance on the correct route to go. The primary goal of this implementation was to get some actual benchmarks to show that such a thing is actually useful in real-world situations and file-systems, where the currently available optimizations don't offer much help. Thanks again, -Mike |
From: Mike S. <ma...@gm...> - 2013-05-22 18:27:37
|
On Thu, Apr 18, 2013 at 11:09 AM, Mike Shal <ma...@gm...> wrote: > Thanks for taking the time to review. > > On Tue, Apr 16, 2013 at 11:54 PM, Feng Shuo <ste...@gm...>wrote: > >> Hi Mike, >> >> I reviewed the two patches. Putting aside some kernel implementation >> issues (locking, reference count etc. ), it uses fuse_fh to pass the fd, >> which should be opaque to fuse kernel. Also, the maintenance of fd >> (open/close) will be really complicate in a real file system implementation >> (not the fusexmp_fh:-)...... Anyway this patch is a very cool adventure. We >> just need to think more about it. >> > > Yeah, as I mentioned I don't really know the correct way to do this > (particularly the kernel side). I'd be happy to work on a proper patch, but > I could use some guidance on the correct route to go. The primary goal of > this implementation was to get some actual benchmarks to show that such a > thing is actually useful in real-world situations and file-systems, where > the currently available optimizations don't offer much help. > Ping - can anyone provide some guidance on how to implement read/write passthrough properly? The performance benefits are quite nice for mirrored filesystems. Thanks, -Mike |