Re: [fuse-devel] Test results: Avoiding user-space calls - vectoring page functions directly to blk

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

> Could you please test with increased, 1 MB FUSE block size too, so we 
would 
> have an apple-to-apple comparison? 

I will do that. I subsequently did run one other test too, though. I ran 
the same
"dd" test on the raw block device (that my FUSE filesystem was backed by).

On the first run of the test, the number through my bypass were 100% 
identical to those
of the raw block device. (131 MB/Sec, all same CPU loads, context 
switches, etc.) This was
both encouraging and expected.

On the second, and subsequent tests, the rate went right to 2.8GB/Sec. 
This obviously meant
it was comming from cache. BTW when I unloaded and re-loaded by 
bypass-shim (which "closed"
and "re-opened" the block device), the test would go back down to 
131MB/Sec - for only the
subsequent test. i.e. My having the device open was keeping the cache 
alive. 

(Which ties into...)

> Do you drop_caches between the tests (or remount the volume)?

No - but I think the way I am chaining the calls to the block device does 
it in a way
where I am bypassing the cache. Like I said above, when doing the I/Os 
directly to the
block device with my shim having the block device opened, I see the effect 
of caching,
but do *not* see it when I am talking to my FUSE FS through my bypass.

(Did what I just said make any sense?)

I think it was because I was chaining fuse_readpages to mpage_readpages - 
and not to something
like blkdev_readpages. (I think that's what it's called - not looking at 
the code right now).

But anyway - I think that I am pretty sure I have to manually add the 
block to the cache when
using mpage_readpages, which I'm not doing.

On the corrolary, this is both a problem, and a benefit. For most people, 
(like NTFS-3G) I would
assume you would naturally want to use the cache. For my own (selfish) 
needs, this isn't necessarily
so for two reasons:

#1. I am building a distributed, networked, cluster filesystem and have 
cache coherency issues between
nodes to worry about - so just using existing Linux caches may not be the 
right way to go (but the jury
isn't in on that yet).

#2. We are using this in a video application, which has some very unique 
cache and prefetch requierments. 
Thus a normal LRU-type algorithm is not best for us - we have other 
algorithms that we may want to use. So
I am struggling with letting Linux cache it or not cache it - making it 
optional - and if Linux doesn't,
should I, or how should I allow external cache and prefetch policy 
management, while still meeting the
goal of minimizing copy and constant user-space interaction for every I/O.

I'm still thinking about #2 - but other than that, I'm very very hapy with 
the performance I'm seeing.

Re: [fuse-devel] Test results: Avoiding user-space calls - vectoring page functions directly to blk

Re: [fuse-devel] Test results: Avoiding user-space calls - vectoring page functions directly to blkdevs