From: Miklos S. <mi...@sz...> - 2008-06-14 09:59:52
|
> > a) where the filesystem gets its data from and what the user of the > > filesystem does with the data > > The data will come from entierly from block devices, as in a typical > filesystem (like ext3 or whatever). The device will be a RAID device > (hardware or software) which would be sourcing its data through other > block devices - primarily SCSI, iSCSI or Infiniband. > > The user of the filesystem will principally be Apache - which will be > sending the files out over TCP. Apache uses sendfile to do this. The TCP > is offloaded through a TOE - which uses sendfile (et al) to bus-master the > block directly out of the page cache to the network. The CPU literally > never touches or moves the data. A few other users of the data are some > other (non-TCP) network stacks and file export (via. Infiniband) > transports which use the same zero-copy SG directly out of page-cache > method (a-la sendfile). What sort of data is this, small files or large files? Why aren't you using a normal high performance filesystem on top of those block devices? Are you aware of the fact that context switches between the caller (Apache) and the filesystem daemon can be a significant performance issue with fuse? This usually dominates CPU usage, not memory copies, and is even harder to eliminate. > > b) some performance data (bandwidth, CPU usage) with the current fuse > > setup > > The hardware that will ultimatly run this actual setup isn't available > yet. I also need to know the most optimum way to handle the I/O with FUSE > as-is. (Like the "read" vs. mmaped "readpage" stuff I was asking about > before). Mmap is almost always the wrong answer to performance problems because setting up the memory mapping is going to be far slower than a memory copy. And it wouldn't even eliminate the memory copy from the device's page cache to the filesystem's page cache. I'd say it's impossible to design a solution without actually having a means to test it out. We may come up with some perfect zero-copy solution using splice() or some other mechanism, and yet it may be still irrelevant because of some other performance limitation. > > c) what changes you propose to improve the performance, and how much > > you expect the performance to improve (preferably with a prototype > > and actual measurements). > > I'm less of in the phase of proposing changes, as I am understanding > FUSE's capabilites, and how it works - as well as coming a bit up a > learning-curve of some VFS stuff. That's cool :) > What I need is just basic ability to do zero-copy readpage support to a > block device - just like other filesystems (like ext3) do. Look at splice(). It's the most promising interface for this sort of thing, and it might be made usable on the fuse device (it won't work now, it would need additional code in the fuse kernel module). But I'm not familiar either with that interface to say for sure if that will work or not. I'd also suggest that you look at some solutions not involving fuse. Being in userspace is nice and all that, but it will never have the same performance as an in-kernel solution. Miklos |