Thread: [fuse-devel] Sector aligned address & offsets to all read/write requests from lib-fuse

fuse-devel

[fuse-devel] Sector aligned address & offsets to all read/write requests from lib-fuse

From: VenkataRao <ven...@gm...> - 2011-10-27 21:03:17

Hi,

I'm kind of new to this fuse development and our file-system
implementation requires the buffer address & file offsets to all read
& write requests from lib-fuse to be aligned with 512 bytes.

We are implementing a file system on a backed hard disk, where in, the
file-system opens the disk device in O_DIRECT mode  and eventually
O_DIRECT requires for all the read & write requests must have the
buffer & offset aligned with sector boundaries.

Is there any way that I can get these buffer address and file offsets
aligned with 512 bytes when a read or write request comes to the
file-system from the lib-fuse, like a mount option or some thing else.

If possible, could you guys please kindly guide me how can I get these
buffer address and file offsets 512 aligned.

Thanks for your attention and time.

Regards,
VenkataRao.

Re: [fuse-devel] Sector aligned address & offsets to all read/write requests from lib-fuse

From: Alain S. <asp...@gm...> - 2011-10-30 23:13:30

On Thu, Oct 27, 2011 at 11:03 PM, VenkataRao
<ven...@gm...> wrote:
> Hi,
>
> I'm kind of new to this fuse development and our file-system
> implementation requires the buffer address & file offsets to all read
> & write requests from lib-fuse to be aligned with 512 bytes.

Not aligned with a page boundary, aka 4k instead ? Just asking ?

>
> We are implementing a file system on a backed hard disk, where in, the
> file-system opens the disk device in O_DIRECT mode  and eventually
> O_DIRECT requires for all the read & write requests must have the
> buffer & offset aligned with sector boundaries.
>
> Is there any way that I can get these buffer address and file offsets
> aligned with 512 bytes when a read or write request comes to the
> file-system from the lib-fuse, like a mount option or some thing else.

I don't know. Probably not
But I have the same requirement myself, I use "read before write".

If I get a write of 4bytes to offset  520 then I read 512 bytes of
block at offset 512
I write the 4bytes a the appropriate place (local offset 4= 520-512) and then
write back the block to appropriate place.



>
> If possible, could you guys please kindly guide me how can I get these
> buffer address and file offsets 512 aligned.
>
> Thanks for your attention and time.
>
> Regards,
> VenkataRao.
>
> ------------------------------------------------------------------------------
> The demand for IT networking professionals continues to grow, and the
> demand for specialized networking skills is growing even more rapidly.
> Take a complimentary Learning@Cisco Self-Assessment and learn
> about Cisco certifications, training, and career opportunities.
> http://p.sf.net/sfu/cisco-dev2dev
> _______________________________________________
> fuse-devel mailing list
> fus...@li...
> https://lists.sourceforge.net/lists/listinfo/fuse-devel
>



-- 
Alain Spineux                   |  aspineux gmail com
Monitor your iT & Backups |  http://www.magikmon.com
Free Backup front-end       | http://www.magikmon.com/mksbackup
Your email 100% available |  http://www.emailgency.com

Re: [fuse-devel] Sector aligned address & offsets to all read/write requests from lib-fuse

From: VenkataRao <ven...@gm...> - 2011-11-01 17:41:57

Please see below for my responses.

On Sun, Oct 30, 2011 at 4:13 PM, Alain Spineux <asp...@gm...> wrote:

> On Thu, Oct 27, 2011 at 11:03 PM, VenkataRao
> <ven...@gm...> wrote:
> > Hi,
> >
> > I'm kind of new to this fuse development and our file-system
> > implementation requires the buffer address & file offsets to all read
> > & write requests from lib-fuse to be aligned with 512 bytes.
>
> Not aligned with a page boundary, aka 4k instead ? Just asking ?
>

[Venkata]:  As we are using > 2.6 kernel and under Linux 2.6 kernels, the
alignment to 512-byte boundaries suffices, I think we do not need 4k
boundaries. If you use older versions (e.g 2.4), we need 4K boundaries.


>
> >
> > We are implementing a file system on a backed hard disk, where in, the
> > file-system opens the disk device in O_DIRECT mode  and eventually
> > O_DIRECT requires for all the read & write requests must have the
> > buffer & offset aligned with sector boundaries.
> >
> > Is there any way that I can get these buffer address and file offsets
> > aligned with 512 bytes when a read or write request comes to the
> > file-system from the lib-fuse, like a mount option or some thing else.
>
> I don't know. Probably not
> But I have the same requirement myself, I use "read before write".
>
> If I get a write of 4bytes to offset  520 then I read 512 bytes of
> block at offset 512
> I write the 4bytes a the appropriate place (local offset 4= 520-512) and
> then
> write back the block to appropriate place.


[Venkata]: I also can do read-before-write, but there will be an additional
copy involved and due to this the file-system performance gets effected.


>
> >
> > If possible, could you guys please kindly guide me how can I get these
> > buffer address and file offsets 512 aligned.
> >
> > Thanks for your attention and time.
> >
> > Regards,
> > VenkataRao.
> >
> >
> ------------------------------------------------------------------------------
> > The demand for IT networking professionals continues to grow, and the
> > demand for specialized networking skills is growing even more rapidly.
> > Take a complimentary Learning@Cisco Self-Assessment and learn
> > about Cisco certifications, training, and career opportunities.
> > http://p.sf.net/sfu/cisco-dev2dev
> > _______________________________________________
> > fuse-devel mailing list
> > fus...@li...
> > https://lists.sourceforge.net/lists/listinfo/fuse-devel
> >
>
>
>
> --
> Alain Spineux                   |  aspineux gmail com
> Monitor your iT & Backups |  http://www.magikmon.com
> Free Backup front-end       | http://www.magikmon.com/mksbackup
> Your email 100% available |  http://www.emailgency.com
>

Re: [fuse-devel] Sector aligned address & offsets to all read/write requests from lib-fuse

From: Alain S. <asp...@gm...> - 2011-11-03 22:54:09

On Tue, Nov 1, 2011 at 6:41 PM, VenkataRao
<ven...@gm...> wrote:
> Please see below for my responses.
>
> On Sun, Oct 30, 2011 at 4:13 PM, Alain Spineux <asp...@gm...> wrote:
>
>> On Thu, Oct 27, 2011 at 11:03 PM, VenkataRao
>> <ven...@gm...> wrote:
>> > Hi,
>> >
>> > I'm kind of new to this fuse development and our file-system
>> > implementation requires the buffer address & file offsets to all read
>> > & write requests from lib-fuse to be aligned with 512 bytes.
>>
>> Not aligned with a page boundary, aka 4k instead ? Just asking ?
>>
>
> [Venkata]:  As we are using > 2.6 kernel and under Linux 2.6 kernels, the
> alignment to 512-byte boundaries suffices, I think we do not need 4k
> boundaries. If you use older versions (e.g 2.4), we need 4K boundaries.
>
>
>>
>> >
>> > We are implementing a file system on a backed hard disk, where in, the
>> > file-system opens the disk device in O_DIRECT mode  and eventually
>> > O_DIRECT requires for all the read & write requests must have the
>> > buffer & offset aligned with sector boundaries.
>> >
>> > Is there any way that I can get these buffer address and file offsets
>> > aligned with 512 bytes when a read or write request comes to the
>> > file-system from the lib-fuse, like a mount option or some thing else.
>>
>> I don't know. Probably not
>> But I have the same requirement myself, I use "read before write".
>>
>> If I get a write of 4bytes to offset  520 then I read 512 bytes of
>> block at offset 512
>> I write the 4bytes a the appropriate place (local offset 4= 520-512) and
>> then
>> write back the block to appropriate place.
>
>
> [Venkata]: I also can do read-before-write, but there will be an additional
> copy involved and due to this the file-system performance gets effected.

I don't think you have the choice.
You could maintain a local cache to improve further writes to the same block.
You must read it at least the first time !

>
>
>>
>> >
>> > If possible, could you guys please kindly guide me how can I get these
>> > buffer address and file offsets 512 aligned.
>> >
>> > Thanks for your attention and time.
>> >
>> > Regards,
>> > VenkataRao.
>> >
>> >
>> ------------------------------------------------------------------------------
>> > The demand for IT networking professionals continues to grow, and the
>> > demand for specialized networking skills is growing even more rapidly.
>> > Take a complimentary Learning@Cisco Self-Assessment and learn
>> > about Cisco certifications, training, and career opportunities.
>> > http://p.sf.net/sfu/cisco-dev2dev
>> > _______________________________________________
>> > fuse-devel mailing list
>> > fus...@li...
>> > https://lists.sourceforge.net/lists/listinfo/fuse-devel
>> >
>>
>>
>>
>> --
>> Alain Spineux                   |  aspineux gmail com
>> Monitor your iT & Backups |  http://www.magikmon.com
>> Free Backup front-end       | http://www.magikmon.com/mksbackup
>> Your email 100% available |  http://www.emailgency.com
>>
> ------------------------------------------------------------------------------
> RSA&reg; Conference 2012
> Save &#36;700 by Nov 18
> Register now
> http://p.sf.net/sfu/rsa-sfdev2dev1
> _______________________________________________
> fuse-devel mailing list
> fus...@li...
> https://lists.sourceforge.net/lists/listinfo/fuse-devel
>



-- 
Alain Spineux                   |  aspineux gmail com
Monitor your iT & Backups |  http://www.magikmon.com
Free Backup front-end       | http://www.magikmon.com/mksbackup
Your email 100% available |  http://www.emailgency.com

Re: [fuse-devel] Sector aligned address & offsets to all read/write requests from lib-fuse

From: Miklos S. <mi...@sz...> - 2011-11-08 18:58:08

>>> On Thu, Oct 27, 2011 at 11:03 PM, VenkataRao
>>> <ven...@gm...> wrote:
>>> > Hi,
>>> >
>>> > I'm kind of new to this fuse development and our file-system
>>> > implementation requires the buffer address & file offsets to all read
>>> > & write requests from lib-fuse to be aligned with 512 bytes.

Following patch should fix the read buffer alignment.  Fixing the write
buffer alignment properly would be more involved.

You could also try the zero copy interfaces introduced in the latest git
version.  I'm not sure it will work with O_DIRECT files though, but that
would definitely be worth fixing.

Thanks,
Miklos


diff --git a/lib/fuse.c b/lib/fuse.c
index db638ec..f1b2a97 100644
--- a/lib/fuse.c
+++ b/lib/fuse.c
@@ -1698,10 +1698,10 @@ int fuse_fs_read_buf(struct fuse_fs *fs, const char *path,
 			if (buf == NULL)
 				return -ENOMEM;
 
-			mem = malloc(size);
-			if (mem == NULL) {
+			res = -posix_memalign(&mem, getpagesize(), size);
+			if (res != 0) {
 				free(buf);
-				return -ENOMEM;
+				return res;
 			}
 			*buf = FUSE_BUFVEC_INIT(size);
 			buf->buf[0].mem = mem;
@@ -1777,9 +1777,8 @@ int fuse_fs_write_buf(struct fuse_fs *fs, const char *path,
 			    !(buf->buf[0].flags & FUSE_BUF_IS_FD)) {
 				flatbuf = &buf->buf[0];
 			} else {
-				res = -ENOMEM;
-				mem = malloc(size);
-				if (mem == NULL)
+				res = -posix_memalign(&mem, getpagesize(), size);
+				if (res != 0)
 					goto out;
 
 				tmp.buf[0].mem = mem;

Re: [fuse-devel] Sector aligned address & offsets to all read/write requests from lib-fuse

From: Goswin v. B. <gos...@we...> - 2011-11-19 22:57:52

Miklos Szeredi <mi...@sz...> writes:

>>>> On Thu, Oct 27, 2011 at 11:03 PM, VenkataRao
>>>> <ven...@gm...> wrote:
>>>> > Hi,
>>>> >
>>>> > I'm kind of new to this fuse development and our file-system
>>>> > implementation requires the buffer address & file offsets to all read
>>>> > & write requests from lib-fuse to be aligned with 512 bytes.
>
> Following patch should fix the read buffer alignment.  Fixing the write
> buffer alignment properly would be more involved.

For write buffers the buffer needs to be aligned to 512/4K - the size of
the header. Unfortunately there is no posix_memalign() equivalent to
allocate memory at an offset to alignment. But it isn't that hard
(although a bit wastefull) to allocate a bigger buffer and ignore the
first X bytes.

Alternatively wouldn't it be possible to have 2 buffers, one for the
header, one for payload data, and user readv() to fill them both
atomically?

> You could also try the zero copy interfaces introduced in the latest git
> version.  I'm not sure it will work with O_DIRECT files though, but that
> would definitely be worth fixing.

Should work.

> Thanks,
> Miklos

MfG
        Goswin

Re: [fuse-devel] Sector aligned address & offsets to all read/write requests from lib-fuse

From: Miklos S. <mi...@sz...> - 2011-11-25 18:24:45

On Sat, Nov 19, 2011 at 11:57 PM, Goswin von Brederlow
<gos...@we...> wrote:
> Miklos Szeredi <mi...@sz...> writes:
>
>>>>> On Thu, Oct 27, 2011 at 11:03 PM, VenkataRao
>>>>> <ven...@gm...> wrote:
>>>>> > Hi,
>>>>> >
>>>>> > I'm kind of new to this fuse development and our file-system
>>>>> > implementation requires the buffer address & file offsets to all read
>>>>> > & write requests from lib-fuse to be aligned with 512 bytes.
>>
>> Following patch should fix the read buffer alignment.  Fixing the write
>> buffer alignment properly would be more involved.
>
> For write buffers the buffer needs to be aligned to 512/4K - the size of
> the header. Unfortunately there is no posix_memalign() equivalent to
> allocate memory at an offset to alignment. But it isn't that hard
> (although a bit wastefull) to allocate a bigger buffer and ignore the
> first X bytes.
>
> Alternatively wouldn't it be possible to have 2 buffers, one for the
> header, one for payload data, and user readv() to fill them both
> atomically?

Yes, those are possibilities.  The problem is, the parsing of the
message is done on a different layer as the reading of the message.
Which means it's pretty difficult to cleanly add those alignment
constraints to the message reading part.

But yes, it's doable.

Thanks,
Miklos

Re: [fuse-devel] Sector aligned address & offsets to all read/write requests from lib-fuse

From: Goswin v. B. <gos...@we...> - 2011-11-26 04:02:28

Miklos Szeredi <mi...@sz...> writes:

> On Sat, Nov 19, 2011 at 11:57 PM, Goswin von Brederlow
> <gos...@we...> wrote:
>> Miklos Szeredi <mi...@sz...> writes:
>>
>>>>>> On Thu, Oct 27, 2011 at 11:03 PM, VenkataRao
>>>>>> <ven...@gm...> wrote:
>>>>>> > Hi,
>>>>>> >
>>>>>> > I'm kind of new to this fuse development and our file-system
>>>>>> > implementation requires the buffer address & file offsets to all read
>>>>>> > & write requests from lib-fuse to be aligned with 512 bytes.
>>>
>>> Following patch should fix the read buffer alignment. Â Fixing the write
>>> buffer alignment properly would be more involved.
>>
>> For write buffers the buffer needs to be aligned to 512/4K - the size of
>> the header. Unfortunately there is no posix_memalign() equivalent to
>> allocate memory at an offset to alignment. But it isn't that hard
>> (although a bit wastefull) to allocate a bigger buffer and ignore the
>> first X bytes.
>>
>> Alternatively wouldn't it be possible to have 2 buffers, one for the
>> header, one for payload data, and user readv() to fill them both
>> atomically?
>
> Yes, those are possibilities.  The problem is, the parsing of the
> message is done on a different layer as the reading of the message.
> Which means it's pretty difficult to cleanly add those alignment
> constraints to the message reading part.
>
> But yes, it's doable.
>
> Thanks,
> Miklos

I know the code. :) I already started on a patch for this a while back
but then I got concerns about the atomicity of readv(). The manpage says
readv/writev are atomic with the exception noted in pipe(7), which I
means this part:

       O_NONBLOCK disabled, n > PIPE_BUF
              The write is nonatomic: the data given to write(2) may be inter-
              leaved with write(2)s by  other  process;  the  write(2)  blocks
              until n bytes have been written.

What isn't clear is wether that only applies if a pipe is involved or on
any readv/writev operation? Well, the only concern would be the
behaviour of /dev/fuse for this. But if you say they will be atomic then
lets try that.

I will see if I can dig out the patch, update and complete it this
weekend.

MfG
        Goswin

Re: [fuse-devel] Sector aligned address & offsets to all read/write requests from lib-fuse

From: Goswin v. B. <gos...@we...> - 2011-11-27 17:25:31

Miklos Szeredi <mi...@sz...> writes:

> On Sat, Nov 19, 2011 at 11:57 PM, Goswin von Brederlow
> <gos...@we...> wrote:
>> Miklos Szeredi <mi...@sz...> writes:
>>
>>>>>> On Thu, Oct 27, 2011 at 11:03 PM, VenkataRao
>>>>>> <ven...@gm...> wrote:
>>>>>> > Hi,
>>>>>> >
>>>>>> > I'm kind of new to this fuse development and our file-system
>>>>>> > implementation requires the buffer address & file offsets to all read
>>>>>> > & write requests from lib-fuse to be aligned with 512 bytes.
>>>
>>> Following patch should fix the read buffer alignment. Â Fixing the write
>>> buffer alignment properly would be more involved.
>>
>> For write buffers the buffer needs to be aligned to 512/4K - the size of
>> the header. Unfortunately there is no posix_memalign() equivalent to
>> allocate memory at an offset to alignment. But it isn't that hard
>> (although a bit wastefull) to allocate a bigger buffer and ignore the
>> first X bytes.
>>
>> Alternatively wouldn't it be possible to have 2 buffers, one for the
>> header, one for payload data, and user readv() to fill them both
>> atomically?
>
> Yes, those are possibilities.  The problem is, the parsing of the
> message is done on a different layer as the reading of the message.
> Which means it's pretty difficult to cleanly add those alignment
> constraints to the message reading part.
>
> But yes, it's doable.
>
> Thanks,
> Miklos

I updated my git clone and tried to update my old patch for alignable
and stealable buffers but there has been quite a bit of bitrot. Looking
at the splice interface I wonder if it is even worth it now to do this
at all. Lets compare the two:

1) alignable and stealable buffers

- New functions: fuse_session_receive_bufv, fuse_session_process_bufv,
                 fuse_ll_receive_bufv, fuse_ll_process_bufv,
                 set_alignment, steal_buffer
- New fields in the session and request structures
- rewrite fuse_loop, fuse_loop_mt and wroker structure

2) splice (no changes to libfuse)

- Set splice mode
- Allocate buffer yourself
- Call fuse_buf_copy

Having aligned buffers in libfuse would save at least 2 syscalls per
write request but the changes needed seem to be a bit much right now.

MfG
        Goswin

Re: [fuse-devel] Sector aligned address & offsets to all read/write requests from lib-fuse

From: Miklos S. <mi...@sz...> - 2011-12-05 12:25:14

On Sun, Nov 27, 2011 at 6:25 PM, Goswin von Brederlow <gos...@we...> wrote:
> I updated my git clone and tried to update my old patch for alignable
> and stealable buffers but there has been quite a bit of bitrot. Looking
> at the splice interface I wonder if it is even worth it now to do this
> at all. Lets compare the two:
>
> 1) alignable and stealable buffers
>
> - New functions: fuse_session_receive_bufv, fuse_session_process_bufv,
>                 fuse_ll_receive_bufv, fuse_ll_process_bufv,
>                 set_alignment, steal_buffer
> - New fields in the session and request structures
> - rewrite fuse_loop, fuse_loop_mt and wroker structure
>
> 2) splice (no changes to libfuse)
>
> - Set splice mode
> - Allocate buffer yourself
> - Call fuse_buf_copy
>
> Having aligned buffers in libfuse would save at least 2 syscalls per
> write request but the changes needed seem to be a bit much right now.

Yes, the extra syscalls will hurt some workloads, but mainly because
they will be used on all requests not just the big write ones.  For
big writes the overhead of the two extra syscalls is negligible.

So I believe the right direction is to eliminate the overhead for
small requests while keeping the advantages of the splice interface
for handling big writes.

Thanks,
Miklos

Re: [fuse-devel] Sector aligned address & offsets to all read/write requests from lib-fuse

From: Goswin v. B. <gos...@we...> - 2011-12-05 19:22:48

Miklos Szeredi <mi...@sz...> writes:

> On Sun, Nov 27, 2011 at 6:25 PM, Goswin von Brederlow <gos...@we...> wrote:
>> I updated my git clone and tried to update my old patch for alignable
>> and stealable buffers but there has been quite a bit of bitrot. Looking
>> at the splice interface I wonder if it is even worth it now to do this
>> at all. Lets compare the two:
>>
>> 1) alignable and stealable buffers
>>
>> - New functions: fuse_session_receive_bufv, fuse_session_process_bufv,
>> Â  Â  Â  Â  Â  Â  Â  Â  fuse_ll_receive_bufv, fuse_ll_process_bufv,
>> Â  Â  Â  Â  Â  Â  Â  Â  set_alignment, steal_buffer
>> - New fields in the session and request structures
>> - rewrite fuse_loop, fuse_loop_mt and wroker structure
>>
>> 2) splice (no changes to libfuse)
>>
>> - Set splice mode
>> - Allocate buffer yourself
>> - Call fuse_buf_copy
>>
>> Having aligned buffers in libfuse would save at least 2 syscalls per
>> write request but the changes needed seem to be a bit much right now.
>
> Yes, the extra syscalls will hurt some workloads, but mainly because
> they will be used on all requests not just the big write ones.  For
> big writes the overhead of the two extra syscalls is negligible.
>
> So I believe the right direction is to eliminate the overhead for
> small requests while keeping the advantages of the splice interface
> for handling big writes.
>
> Thanks,
> Miklos

But how? At the point where you do know the request type and length you
have already spliced the request and data into a pipe. And then it is
too late. You need the extra syscall to get the data out of the pipe
again.

MfG
        Goswin

Re: [fuse-devel] Sector aligned address & offsets to all read/write requests from lib-fuse

From: Miklos S. <mi...@sz...> - 2011-12-06 13:53:00

On Mon, Dec 5, 2011 at 8:22 PM, Goswin von Brederlow <gos...@we...> wrote:

> But how? At the point where you do know the request type and length you
> have already spliced the request and data into a pipe. And then it is
> too late. You need the extra syscall to get the data out of the pipe
> again.

One idea is to allow attaching multiple device file descriptors to the
mount: one on which only small requests are queued and one on which
only large requests.   Then the device instance with the small
requests is read directly, while the device instance with the large
request is spliced into a pipe.

I think there might be better ways to do this but haven't thought about it much.

Thanks,
Miklos

Re: [fuse-devel] Sector aligned address & offsets to all read/write requests from lib-fuse

From: Goswin v. B. <gos...@we...> - 2011-12-07 09:09:37

Miklos Szeredi <mi...@sz...> writes:

> On Mon, Dec 5, 2011 at 8:22 PM, Goswin von Brederlow <gos...@we...> wrote:
>
>> But how? At the point where you do know the request type and length you
>> have already spliced the request and data into a pipe. And then it is
>> too late. You need the extra syscall to get the data out of the pipe
>> again.
>
> One idea is to allow attaching multiple device file descriptors to the
> mount: one on which only small requests are queued and one on which
> only large requests.   Then the device instance with the small
> requests is read directly, while the device instance with the large
> request is spliced into a pipe.
>
> I think there might be better ways to do this but haven't thought about it much.
>
> Thanks,
> Miklos

I don't know how feasable this is, but: Could request be split up into a
command fd and data fd? The main thread would open the command fd and
each thread would request it own data fd over the command fd. Then when
a thread reads a request from the command fd (and it is a big one) the
kernel would dump the data to the data fd for the requesting thread.

With this fuse wouldn't even have to splice the data into an extra pipe
but could just pass the data fd to the callback as is.

MfG
        Goswin

Re: [fuse-devel] Sector aligned address & offsets to all read/write requests from lib-fuse

From: Miklos S. <mi...@sz...> - 2011-12-07 14:50:26

Goswin von Brederlow <gos...@we...> writes:

> I don't know how feasable this is, but: Could request be split up into a
> command fd and data fd? The main thread would open the command fd and
> each thread would request it own data fd over the command fd. Then when
> a thread reads a request from the command fd (and it is a big one) the
> kernel would dump the data to the data fd for the requesting thread.
>
> With this fuse wouldn't even have to splice the data into an extra pipe
> but could just pass the data fd to the callback as is.

The kernel should not need to care about threads and such.  It should
not contain knowledge about the implementation of libfuse.

So slightly modifying your idea: the kernel can attach a data fd
(basically the read end of a pipe) to the request and dump the data into
that pipe.

The question is: how to attach the data fd to the request?  File
descriptor passing over unix domain sockets (SCM_RIGHTS) comes to mind.
But changing the device interface into a socket interface is not quite
trivial.

Passing the file descriptor in the fuse_write_in header would be
possible, but that's really messy.

An ioctl() to request that the data be dumped onto the given pipe is
perhaps the cleanest way to do this.  E.g.

struct fuse_data_request {
	u64 unique;
        int fd;
};

static void do_write()
{
	struct fuse_data_request fdr;
        int pip[2];
	...
        pipe(pip);
        fdr.unique = req->unique;
        fdr.fd = pip[1];
	ioctl(fuse_dev, FUSE_IOCTL_GETDATA, &fdr);
        /* data available in pip[0] */
        ...
}

Thanks,
Miklos

Re: [fuse-devel] Sector aligned address & offsets to all read/write requests from lib-fuse

From: Goswin v. B. <gos...@we...> - 2011-12-08 10:39:37

Miklos Szeredi <mi...@sz...> writes:

> Goswin von Brederlow <gos...@we...> writes:
>
>> I don't know how feasable this is, but: Could request be split up into a
>> command fd and data fd? The main thread would open the command fd and
>> each thread would request it own data fd over the command fd. Then when
>> a thread reads a request from the command fd (and it is a big one) the
>> kernel would dump the data to the data fd for the requesting thread.
>>
>> With this fuse wouldn't even have to splice the data into an extra pipe
>> but could just pass the data fd to the callback as is.
>
> The kernel should not need to care about threads and such.  It should
> not contain knowledge about the implementation of libfuse.
>
> So slightly modifying your idea: the kernel can attach a data fd
> (basically the read end of a pipe) to the request and dump the data into
> that pipe.

How expensive would opening and closing all those pipes be?

> The question is: how to attach the data fd to the request?  File
> descriptor passing over unix domain sockets (SCM_RIGHTS) comes to mind.
> But changing the device interface into a socket interface is not quite
> trivial.
>
> Passing the file descriptor in the fuse_write_in header would be
> possible, but that's really messy.
>
> An ioctl() to request that the data be dumped onto the given pipe is
> perhaps the cleanest way to do this.  E.g.
>
> struct fuse_data_request {
> 	u64 unique;
>         int fd;
> };
>
> static void do_write()
> {
> 	struct fuse_data_request fdr;
>         int pip[2];
> 	...
>         pipe(pip);
>         fdr.unique = req->unique;
>         fdr.fd = pip[1];
> 	ioctl(fuse_dev, FUSE_IOCTL_GETDATA, &fdr);
>         /* data available in pip[0] */
>         ...
> }
>
> Thanks,
> Miklos

And a feature negotiation to tell the kernel to send writes (above a
certain size) without payload. It would add another syscall but given
the size of data this would be for that is probably negible.

As a plus this could easily allow write requests over 128k without the
libfuse having to waste HUGE buffers for trivial requests. And you could
reuse the pipes.

MfG
        Goswin

Re: [fuse-devel] Sector aligned address & offsets to all read/write requests from lib-fuse

From: Goswin v. B. <gos...@we...> - 2011-12-08 10:54:00

Miklos Szeredi <mi...@sz...> writes:

> Goswin von Brederlow <gos...@we...> writes:
>
>> I don't know how feasable this is, but: Could request be split up into a
>> command fd and data fd? The main thread would open the command fd and
>> each thread would request it own data fd over the command fd. Then when
>> a thread reads a request from the command fd (and it is a big one) the
>> kernel would dump the data to the data fd for the requesting thread.
>>
>> With this fuse wouldn't even have to splice the data into an extra pipe
>> but could just pass the data fd to the callback as is.
>
> The kernel should not need to care about threads and such.  It should
> not contain knowledge about the implementation of libfuse.
>
> So slightly modifying your idea: the kernel can attach a data fd
> (basically the read end of a pipe) to the request and dump the data into
> that pipe.
>
> The question is: how to attach the data fd to the request?  File
> descriptor passing over unix domain sockets (SCM_RIGHTS) comes to mind.
> But changing the device interface into a socket interface is not quite
> trivial.
>
> Passing the file descriptor in the fuse_write_in header would be
> possible, but that's really messy.
>
> An ioctl() to request that the data be dumped onto the given pipe is
> perhaps the cleanest way to do this.  E.g.
>
> struct fuse_data_request {
> 	u64 unique;
>         int fd;
> };
>
> static void do_write()
> {
> 	struct fuse_data_request fdr;
>         int pip[2];
> 	...
>         pipe(pip);
>         fdr.unique = req->unique;
>         fdr.fd = pip[1];
> 	ioctl(fuse_dev, FUSE_IOCTL_GETDATA, &fdr);
>         /* data available in pip[0] */
>         ...
> }
>
> Thanks,
> Miklos

One more thing I forgot.

Does the FD have to be a pipe? Think of an overlay filesystem. Why not
pass the FD of the underlying file:

struct fuse_data_request {
	u64 unique;
        u64 offset;
        int fd;
};

static void do_write(...)
{
	struct fuse_data_request fdr;
        fdr.unique = req->unique;
        fdt.offset = off;
        fdr.fd = fi->fh;
	ioctl(fuse_dev, FUSE_IOCTL_GETDATA, &fdr);
}

Plus some error handling.

MfG
        Goswin

Re: [fuse-devel] Sector aligned address & offsets to all read/write requests from lib-fuse

From: Miklos S. <mi...@sz...> - 2011-12-08 11:23:21

Goswin von Brederlow <gos...@we...> writes:

>
> One more thing I forgot.
>
> Does the FD have to be a pipe? Think of an overlay filesystem. Why not
> pass the FD of the underlying file:
>
> struct fuse_data_request {
> 	u64 unique;
>         u64 offset;
>         int fd;
> };
>
> static void do_write(...)
> {
> 	struct fuse_data_request fdr;
>         fdr.unique = req->unique;
>         fdt.offset = off;
>         fdr.fd = fi->fh;
> 	ioctl(fuse_dev, FUSE_IOCTL_GETDATA, &fdr);
> }
>

Right, and observe how that ioctl is almost like a splice() now.  The
only problem is that the request must be transferred from the device
with one syscall otherwise there's nothing to identify parts of the
message and so it cannot be connected up.

But if we manage to drop that requirement everything becomes much
easier.

So what about introducing a new mode of operation where there may be
multiple fuse device fd's assigned to the filesystem and when a read is
attempted on one of them the request is assigned to that particular
device instance and the rest of the request can be transferred with an
arbitrary number of syscalls.

In other words, just drop the thread safety requirement from the device
fd.  This is similar to your first proposal, but there's now no "command
fd" and "data fd" which would again have problems with having to connect
the pieces of the request up somehow.

I like this because

  a) it doesn't require any special method (ioctl's are to be avoided
  when possible),

  b) it's pretty close to what is currently done, so no heavy
  modification is needed on either the kernel or the userspace side.

And then to allow optimizing the write() requests there could be an INIT
flag saying that for certain messages (e.g. write, setxattr) only return
the header in the the first read() but connect up the rest of the
request with that instance.

So the write would just become:

   splice(dev_fd_instance, NULL, pip[1], NULL, arg->size, 0);
   /* data is available from pip[0] */

And at some stage if it looks like a worthwhile optimization the fuse
device could itself acquire pipe-like properties, so splicing from it to
any destination would become possible.  But that's not trivial at all,
and not entirely sure it's even worth doing.

The only question remaining is how to create a new device instance.  One
idea would be:

   new_dev_fd_inst = open("/dev/fuse", ...);
   ioctl(new_dev_fd_inst, FUSE_IOCTL_CONNECT_DEV, orig_dev_fd);

Is there a better way to do this not involving ioctls?

Thanks,
Miklos

Re: [fuse-devel] Sector aligned address & offsets to all read/write requests from lib-fuse

From: Goswin v. B. <gos...@we...> - 2011-12-08 17:30:21

Miklos Szeredi <mi...@sz...> writes:

> Goswin von Brederlow <gos...@we...> writes:
>
>>
>> One more thing I forgot.
>>
>> Does the FD have to be a pipe? Think of an overlay filesystem. Why not
>> pass the FD of the underlying file:
>>
>> struct fuse_data_request {
>> 	u64 unique;
>>         u64 offset;
>>         int fd;
>> };
>>
>> static void do_write(...)
>> {
>> 	struct fuse_data_request fdr;
>>         fdr.unique = req->unique;
>>         fdt.offset = off;
>>         fdr.fd = fi->fh;
>> 	ioctl(fuse_dev, FUSE_IOCTL_GETDATA, &fdr);
>> }
>>
>
> Right, and observe how that ioctl is almost like a splice() now.  The
> only problem is that the request must be transferred from the device
> with one syscall otherwise there's nothing to identify parts of the
> message and so it cannot be connected up.
>
> But if we manage to drop that requirement everything becomes much
> easier.

Requiring that the whole block of data has to be transfered as one
should make things easier. But I wouldn't think that it should be a
problem.

Options:

1) add offset and size to the ioctl and call it multiple times to get
chunks of the paylod

2) add an iovec structure like (p)writev uses to allow writing out the
payload in fragments.

Probably many more.

> So what about introducing a new mode of operation where there may be
> multiple fuse device fd's assigned to the filesystem and when a read is
> attempted on one of them the request is assigned to that particular
> device instance and the rest of the request can be transferred with an
> arbitrary number of syscalls.

You mean you change nothing in the protocol and libfuse may simply do
(in pseudocode):

read(fuse_fd, req, sizeof(req));
if (req->cmd == WRITE) {
  splice(fuse_fd, 0, my_pipe, 0, req->len, flags);
  ...
}

> In other words, just drop the thread safety requirement from the device
> fd.  This is similar to your first proposal, but there's now no "command
> fd" and "data fd" which would again have problems with having to connect
> the pieces of the request up somehow.

If you do this with ioctl() then you don't need multiple FDs or give up
the thread savety. The "unique" in the fuse_data_request would take care
of connecting the payload with the request you care about. I think that
would require the least amount of change.

> I like this because
>
>   a) it doesn't require any special method (ioctl's are to be avoided
>   when possible),
>
>   b) it's pretty close to what is currently done, so no heavy
>   modification is needed on either the kernel or the userspace side.

Not everyone uses the libfuse loop function and if you have your own
multithreaded loop then you need to rewrite that to have one FD per
thread. You need to keep backward compatibility, the feature has to be
negotiated on startup. But I think you already have that well in
mind. :)

> And then to allow optimizing the write() requests there could be an INIT
> flag saying that for certain messages (e.g. write, setxattr) only return
> the header in the the first read() but connect up the rest of the
> request with that instance.
>
> So the write would just become:
>
>    splice(dev_fd_instance, NULL, pip[1], NULL, arg->size, 0);
>    /* data is available from pip[0] */
>
> And at some stage if it looks like a worthwhile optimization the fuse
> device could itself acquire pipe-like properties, so splicing from it to
> any destination would become possible.  But that's not trivial at all,
> and not entirely sure it's even worth doing.

I still think it might be worth it to support small inlined writes and
large external ones. Your way with ioctl() would make that simple to
implement. You tell the kernel the buffer size for request + payload. If
a request has more payload than would fit in the buffer it sends just
the request with a flag set to indicate that payload has to be retrived
seperately via ioctl().

That way small requests (e.g. < 4k payload) would use just a single
read() and large requests read() + ioctl(). I'm not sure at what size
the break even point would be but splicing 200 bytes for a setattr
through a pipe can't be faster than memcpy()ing them.

> The only question remaining is how to create a new device instance.  One
> idea would be:
>
>    new_dev_fd_inst = open("/dev/fuse", ...);
>    ioctl(new_dev_fd_inst, FUSE_IOCTL_CONNECT_DEV, orig_dev_fd);
>
> Is there a better way to do this not involving ioctls?

Can you catch the dup() syscall to make the new FD a new instance
connected to the same fs?

Or have a dup() like ioctl to return a new FD

    ioctl(new_dev_fd_inst, FUSE_DUP_FD);

MfG
        Goswin