From: Maxim V. P. <MPa...@pa...> - 2012-12-10 07:41:15
|
Hi, Existing fuse implementation always processes direct IO synchronously: it submits next request to userspace fuse only when previous is completed. This is suboptimal because: 1) libaio DIO works in blocking way; 2) userspace fuse can't achieve parallelism processing several requests simultaneously (e.g. in case of distributed network storage); 3) userspace fuse can't merge requests before passing it to actual storage. The idea of the patch-set is to submit fuse requests in non-blocking way (where it's possible) and either return -EIOCBQUEUED or wait for their completion synchronously. The patch-set to be applied on top of for-next of Miklos' git repo. To estimate performance improvement I used slightly modified fusexmp over tmpfs (clearing O_DIRECT bit from fi->flags in xmp_open). For synchronous operations I used 'dd' like this: dd of=/dev/null if=/fuse/mnt/file bs=2M count=256 iflag=direct dd if=/dev/zero of=/fuse/mnt/file bs=2M count=256 oflag=direct conv=notrunc For AIO I used 'aio-stress' like this: aio-stress -s 512 -a 4 -b 1 -c 1 -O -o 1 /fuse/mnt/file aio-stress -s 512 -a 4 -b 1 -c 1 -O -o 0 /fuse/mnt/file The throughput on some commodity (rather feeble) server was (in MB/sec): original / patched dd reads: ~322 / ~382 dd writes: ~277 / ~288 aio reads: ~380 / ~459 aio writes: ~319 / ~353 Thanks, Maxim --- Maxim V. Patlasov (6): fuse: move fuse_release_user_pages() up fuse: add support of async IO fuse: make fuse_direct_io() aware about AIO fuse: enable asynchronous processing direct IO fuse: truncate file if async dio failed fuse: optimize short direct reads fs/fuse/cuse.c | 4 - fs/fuse/file.c | 276 ++++++++++++++++++++++++++++++++++++++++++++++++------ fs/fuse/fuse_i.h | 17 +++ 3 files changed, 262 insertions(+), 35 deletions(-) -- Signature |
From: Maxim V. P. <MPa...@pa...> - 2012-12-10 07:41:27
|
fuse_release_user_pages() will be indirectly used by fuse_send_read/write in future patches. Signed-off-by: Maxim Patlasov <mpa...@pa...> --- fs/fuse/file.c | 24 ++++++++++++------------ 1 files changed, 12 insertions(+), 12 deletions(-) diff --git a/fs/fuse/file.c b/fs/fuse/file.c index 19b50e7..6685cb0 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -491,6 +491,18 @@ void fuse_read_fill(struct fuse_req *req, struct file *file, loff_t pos, req->out.args[0].size = count; } +static void fuse_release_user_pages(struct fuse_req *req, int write) +{ + unsigned i; + + for (i = 0; i < req->num_pages; i++) { + struct page *page = req->pages[i]; + if (write) + set_page_dirty_lock(page); + put_page(page); + } +} + static size_t fuse_send_read(struct fuse_req *req, struct file *file, loff_t pos, size_t count, fl_owner_t owner) { @@ -1035,18 +1047,6 @@ out: return written ? written : err; } -static void fuse_release_user_pages(struct fuse_req *req, int write) -{ - unsigned i; - - for (i = 0; i < req->num_pages; i++) { - struct page *page = req->pages[i]; - if (write) - set_page_dirty_lock(page); - put_page(page); - } -} - static inline void fuse_page_descs_length_init(struct fuse_req *req, unsigned index, unsigned nr_pages) { |
From: Maxim V. P. <MPa...@pa...> - 2012-12-10 07:41:38
|
The patch implements a framework to process an IO request asynchronously. The idea is to associate several fuse requests with a single kiocb by means of fuse_io_priv structure. The structure plays the same role for FUSE as 'struct dio' for direct-io.c. The framework is supposed to be used like this: - someone (who wants to process an IO asynchronously) allocates fuse_io_priv, initializes and saves it in kiocb->private. - as soon as fuse request is filled, it can be submitted (in non-blocking way) by fuse_async_req_send() - when all submitted requests are ACKed by userspace, io->reqs drops to zero triggering aio_complete() In case of IO initiated by libaio, aio_complete() will finish processing the same way as in case of dio_complete() calling aio_complete(). But the framework may be also used for internal FUSE use when initial IO request was synchronous (from user perspective), but it's beneficial to process it asynchronously. Then the caller should wait on kiocb explicitly and aio_complete() will wake the caller up. Signed-off-by: Maxim Patlasov <mpa...@pa...> --- fs/fuse/file.c | 94 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ fs/fuse/fuse_i.h | 15 +++++++++ 2 files changed, 109 insertions(+), 0 deletions(-) diff --git a/fs/fuse/file.c b/fs/fuse/file.c index 6685cb0..634f54a 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -503,6 +503,100 @@ static void fuse_release_user_pages(struct fuse_req *req, int write) } } +/** + * In case of short read, the caller sets 'pos' to the position of + * actual end of fuse request in IO request. Otherwise, if bytes_requested + * == bytes_transferred or rw == WRITE, the caller sets 'pos' to -1. + * + * An example: + * User requested DIO read of 64K. It was splitted into two 32K fuse requests, + * both submitted asynchronously. The first of them was ACKed by userspace as + * fully completed (req->out.args[0].size == 32K) resulting in pos == -1. The + * second request was ACKed as short, e.g. only 1K was read, resulting in + * pos == 33K. + * + * Thus, when all fuse requests are completed, the minimal non-negative 'pos' + * will be equal to the length of the longest contiguous fragment of + * transferred data starting from the beginning of IO request. + */ +static void fuse_aio_complete(struct fuse_io_priv *io, int err, ssize_t pos) +{ + int left; + + spin_lock(&io->lock); + if (err) + io->err = io->err ? : err; + else if (pos >= 0 && (io->bytes < 0 || pos < io->bytes)) + io->bytes = pos; + + left = --io->reqs; + spin_unlock(&io->lock); + + if (!left) { + long res; + + if (io->err) + res = io->err; + else if (io->bytes >= 0 && io->write) + res = -EIO; + else { + res = io->bytes < 0 ? io->size : io->bytes; + + if (!is_sync_kiocb(io->iocb)) { + struct path *path = &io->iocb->ki_filp->f_path; + struct inode *inode = path->dentry->d_inode; + struct fuse_conn *fc = get_fuse_conn(inode); + struct fuse_inode *fi = get_fuse_inode(inode); + + spin_lock(&fc->lock); + fi->attr_version = ++fc->attr_version; + spin_unlock(&fc->lock); + } + } + + aio_complete(io->iocb, res, 0); + kfree(io); + } +} + +static void fuse_aio_complete_req(struct fuse_conn *fc, struct fuse_req *req) +{ + struct fuse_io_priv *io = req->io; + ssize_t pos = -1; + + fuse_release_user_pages(req, !io->write); + + if (io->write) { + if (req->misc.write.in.size != req->misc.write.out.size) + pos = req->misc.write.in.offset - io->offset + + req->misc.write.out.size; + } else { + if (req->misc.read.in.size != req->out.args[0].size) + pos = req->misc.read.in.offset - io->offset + + req->out.args[0].size; + } + + fuse_aio_complete(io, req->out.h.error, pos); +} + +static size_t fuse_async_req_send(struct fuse_conn *fc, struct fuse_req *req, + size_t num_bytes, struct kiocb *iocb) +{ + struct fuse_io_priv *io = iocb->private; + + spin_lock(&io->lock); + io->size += num_bytes; + io->reqs++; + spin_unlock(&io->lock); + + req->io = io; + req->end = fuse_aio_complete_req; + + fuse_request_send_background(fc, req); + + return num_bytes; +} + static size_t fuse_send_read(struct fuse_req *req, struct file *file, loff_t pos, size_t count, fl_owner_t owner) { diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index e4f70ea..618d48a 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -219,6 +219,18 @@ enum fuse_req_state { FUSE_REQ_FINISHED }; +/** The request IO state (for asynchronous processing) */ +struct fuse_io_priv { + spinlock_t lock; + unsigned reqs; + ssize_t bytes; + size_t size; + __u64 offset; + bool write; + int err; + struct kiocb *iocb; +}; + /** * A request to the client */ @@ -323,6 +335,9 @@ struct fuse_req { /** Inode used in the request or NULL */ struct inode *inode; + /** AIO control block */ + struct fuse_io_priv *io; + /** Link on fi->writepages */ struct list_head writepages_entry; |
From: Maxim V. P. <MPa...@pa...> - 2012-12-10 07:41:48
|
The patch implements passing "struct kiocb *async" down the stack up to fuse_send_read/write where it is used to submit request asynchronously. async==NULL designates synchronous processing. Non-trivial part of the patch is changes in fuse_direct_io(): resources like fuse requests and user pages cannot be released immediately in async case. Signed-off-by: Maxim Patlasov <mpa...@pa...> --- fs/fuse/cuse.c | 4 ++-- fs/fuse/file.c | 58 ++++++++++++++++++++++++++++++++++++------------------ fs/fuse/fuse_i.h | 2 +- 3 files changed, 42 insertions(+), 22 deletions(-) diff --git a/fs/fuse/cuse.c b/fs/fuse/cuse.c index 65ce10a..beb99e9 100644 --- a/fs/fuse/cuse.c +++ b/fs/fuse/cuse.c @@ -93,7 +93,7 @@ static ssize_t cuse_read(struct file *file, char __user *buf, size_t count, loff_t pos = 0; struct iovec iov = { .iov_base = buf, .iov_len = count }; - return fuse_direct_io(file, &iov, 1, count, &pos, 0); + return fuse_direct_io(file, &iov, 1, count, &pos, 0, NULL); } static ssize_t cuse_write(struct file *file, const char __user *buf, @@ -106,7 +106,7 @@ static ssize_t cuse_write(struct file *file, const char __user *buf, * No locking or generic_write_checks(), the server is * responsible for locking and sanity checks. */ - return fuse_direct_io(file, &iov, 1, count, &pos, 1); + return fuse_direct_io(file, &iov, 1, count, &pos, 1, NULL); } static int cuse_open(struct inode *inode, struct file *file) diff --git a/fs/fuse/file.c b/fs/fuse/file.c index 634f54a..c585158 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -598,7 +598,8 @@ static size_t fuse_async_req_send(struct fuse_conn *fc, struct fuse_req *req, } static size_t fuse_send_read(struct fuse_req *req, struct file *file, - loff_t pos, size_t count, fl_owner_t owner) + loff_t pos, size_t count, fl_owner_t owner, + struct kiocb *async) { struct fuse_file *ff = file->private_data; struct fuse_conn *fc = ff->fc; @@ -610,6 +611,10 @@ static size_t fuse_send_read(struct fuse_req *req, struct file *file, inarg->read_flags |= FUSE_READ_LOCKOWNER; inarg->lock_owner = fuse_lock_owner_id(fc, owner); } + + if (async) + return fuse_async_req_send(fc, req, count, async); + fuse_request_send(fc, req); return req->out.args[0].size; } @@ -662,7 +667,7 @@ static int fuse_readpage(struct file *file, struct page *page) req->num_pages = 1; req->pages[0] = page; req->page_descs[0].length = count; - num_read = fuse_send_read(req, file, pos, count, NULL); + num_read = fuse_send_read(req, file, pos, count, NULL, NULL); err = req->out.h.error; fuse_put_request(fc, req); @@ -865,7 +870,8 @@ static void fuse_write_fill(struct fuse_req *req, struct fuse_file *ff, } static size_t fuse_send_write(struct fuse_req *req, struct file *file, - loff_t pos, size_t count, fl_owner_t owner) + loff_t pos, size_t count, fl_owner_t owner, + struct kiocb *async) { struct fuse_file *ff = file->private_data; struct fuse_conn *fc = ff->fc; @@ -877,6 +883,10 @@ static size_t fuse_send_write(struct fuse_req *req, struct file *file, inarg->write_flags |= FUSE_WRITE_LOCKOWNER; inarg->lock_owner = fuse_lock_owner_id(fc, owner); } + + if (async) + return fuse_async_req_send(fc, req, count, async); + fuse_request_send(fc, req); return req->misc.write.out.size; } @@ -904,7 +914,7 @@ static size_t fuse_send_write_pages(struct fuse_req *req, struct file *file, for (i = 0; i < req->num_pages; i++) fuse_wait_on_page_writeback(inode, req->pages[i]->index); - res = fuse_send_write(req, file, pos, count, NULL); + res = fuse_send_write(req, file, pos, count, NULL, NULL); offset = req->page_descs[0].offset; count = res; @@ -1244,7 +1254,7 @@ static inline int fuse_iter_npages(const struct iov_iter *ii_p) ssize_t fuse_direct_io(struct file *file, const struct iovec *iov, unsigned long nr_segs, size_t count, loff_t *ppos, - int write) + int write, struct kiocb *async) { struct fuse_file *ff = file->private_data; struct fuse_conn *fc = ff->fc; @@ -1266,16 +1276,22 @@ ssize_t fuse_direct_io(struct file *file, const struct iovec *iov, size_t nbytes = min(count, nmax); int err = fuse_get_user_pages(req, &ii, &nbytes, write); if (err) { + if (async) + fuse_put_request(fc, req); + res = err; break; } if (write) - nres = fuse_send_write(req, file, pos, nbytes, owner); + nres = fuse_send_write(req, file, pos, nbytes, owner, + async); else - nres = fuse_send_read(req, file, pos, nbytes, owner); + nres = fuse_send_read(req, file, pos, nbytes, owner, + async); - fuse_release_user_pages(req, !write); + if (!async) + fuse_release_user_pages(req, !write); if (req->out.h.error) { if (!res) res = req->out.h.error; @@ -1290,13 +1306,14 @@ ssize_t fuse_direct_io(struct file *file, const struct iovec *iov, if (nres != nbytes) break; if (count) { - fuse_put_request(fc, req); + if (!async) + fuse_put_request(fc, req); req = fuse_get_req(fc, fuse_iter_npages(&ii)); if (IS_ERR(req)) break; } } - if (!IS_ERR(req)) + if (!IS_ERR(req) && !async) fuse_put_request(fc, req); if (res > 0) *ppos = pos; @@ -1306,7 +1323,8 @@ ssize_t fuse_direct_io(struct file *file, const struct iovec *iov, EXPORT_SYMBOL_GPL(fuse_direct_io); static ssize_t __fuse_direct_read(struct file *file, const struct iovec *iov, - unsigned long nr_segs, loff_t *ppos) + unsigned long nr_segs, loff_t *ppos, + struct kiocb *async) { ssize_t res; struct inode *inode = file->f_path.dentry->d_inode; @@ -1315,7 +1333,7 @@ static ssize_t __fuse_direct_read(struct file *file, const struct iovec *iov, return -EIO; res = fuse_direct_io(file, iov, nr_segs, iov_length(iov, nr_segs), - ppos, 0); + ppos, 0, async); fuse_invalidate_attr(inode); @@ -1326,11 +1344,12 @@ static ssize_t fuse_direct_read(struct file *file, char __user *buf, size_t count, loff_t *ppos) { struct iovec iov = { .iov_base = buf, .iov_len = count }; - return __fuse_direct_read(file, &iov, 1, ppos); + return __fuse_direct_read(file, &iov, 1, ppos, NULL); } static ssize_t __fuse_direct_write(struct file *file, const struct iovec *iov, - unsigned long nr_segs, loff_t *ppos) + unsigned long nr_segs, loff_t *ppos, + struct kiocb *async) { struct inode *inode = file->f_path.dentry->d_inode; size_t count = iov_length(iov, nr_segs); @@ -1338,8 +1357,9 @@ static ssize_t __fuse_direct_write(struct file *file, const struct iovec *iov, res = generic_write_checks(file, ppos, &count, 0); if (!res) { - res = fuse_direct_io(file, iov, nr_segs, count, ppos, 1); - if (res > 0) + res = fuse_direct_io(file, iov, nr_segs, count, ppos, 1, + async); + if (!async && res > 0) fuse_write_update_size(inode, *ppos); } @@ -1360,7 +1380,7 @@ static ssize_t fuse_direct_write(struct file *file, const char __user *buf, /* Don't allow parallel writes to the same file */ mutex_lock(&inode->i_mutex); - res = __fuse_direct_write(file, &iov, 1, ppos); + res = __fuse_direct_write(file, &iov, 1, ppos, NULL); mutex_unlock(&inode->i_mutex); return res; @@ -2333,9 +2353,9 @@ fuse_direct_IO(int rw, struct kiocb *iocb, const struct iovec *iov, pos = offset; if (rw == WRITE) - ret = __fuse_direct_write(file, iov, nr_segs, &pos); + ret = __fuse_direct_write(file, iov, nr_segs, &pos, NULL); else - ret = __fuse_direct_read(file, iov, nr_segs, &pos); + ret = __fuse_direct_read(file, iov, nr_segs, &pos, NULL); return ret; } diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index 618d48a..173c959 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -828,7 +828,7 @@ int fuse_do_open(struct fuse_conn *fc, u64 nodeid, struct file *file, bool isdir); ssize_t fuse_direct_io(struct file *file, const struct iovec *iov, unsigned long nr_segs, size_t count, loff_t *ppos, - int write); + int write, struct kiocb *async); long fuse_do_ioctl(struct file *file, unsigned int cmd, unsigned long arg, unsigned int flags); long fuse_ioctl_common(struct file *file, unsigned int cmd, |
From: Maxim V. P. <MPa...@pa...> - 2012-12-10 07:42:01
|
In case of synchronous DIO request (i.e. read(2) or write(2) for a file opened with O_DIRECT), the patch submits fuse requests asynchronously, but waits for their completions before return from fuse_direct_IO(). In case of asynchronous DIO request (i.e. libaio io_submit() or a file opened with O_DIRECT), the patch submits fuse requests asynchronously and return -EIOCBQUEUED immediately. The only special case is async DIO extending file. Here the patch falls back to old behaviour because we can't return -EIOCBQUEUED and update i_size later, without i_mutex hold. Signed-off-by: Maxim Patlasov <mpa...@pa...> --- fs/fuse/file.c | 44 ++++++++++++++++++++++++++++++++++++++++++-- 1 files changed, 42 insertions(+), 2 deletions(-) diff --git a/fs/fuse/file.c b/fs/fuse/file.c index c585158..ef6d3de 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -2348,14 +2348,54 @@ fuse_direct_IO(int rw, struct kiocb *iocb, const struct iovec *iov, ssize_t ret = 0; struct file *file = NULL; loff_t pos = 0; + struct inode *inode; + loff_t i_size; + size_t count = iov_length(iov, nr_segs); + struct kiocb *async_cb = NULL; file = iocb->ki_filp; pos = offset; + inode = file->f_mapping->host; + i_size = i_size_read(inode); + + /* cannot write beyond eof asynchronously */ + if (is_sync_kiocb(iocb) || (offset + count <= i_size) || rw != WRITE) { + struct fuse_io_priv *io; + + io = kmalloc(sizeof(struct fuse_io_priv), GFP_KERNEL); + if (!io) + return -ENOMEM; + + spin_lock_init(&io->lock); + io->reqs = 1; + io->bytes = -1; + io->size = 0; + io->offset = offset; + io->write = (rw == WRITE); + io->err = 0; + io->iocb = iocb; + iocb->private = io; + + async_cb = iocb; + } if (rw == WRITE) - ret = __fuse_direct_write(file, iov, nr_segs, &pos, NULL); + ret = __fuse_direct_write(file, iov, nr_segs, &pos, async_cb); else - ret = __fuse_direct_read(file, iov, nr_segs, &pos, NULL); + ret = __fuse_direct_read(file, iov, nr_segs, &pos, async_cb); + + if (async_cb) { + fuse_aio_complete(async_cb->private, ret == count ? 0 : -EIO, + -1); + + if (!is_sync_kiocb(iocb)) + return -EIOCBQUEUED; + + ret = wait_on_sync_kiocb(iocb); + + if (rw == WRITE && ret > 0) + fuse_write_update_size(inode, pos); + } return ret; } |
From: Maxim V. P. <MPa...@pa...> - 2012-12-10 07:42:08
|
The patch improves error handling in fuse_direct_IO(): if we successfully submitted several fuse requests on behalf of synchronous direct write extending file and some of them failed, let's try to do our best to clean-up. Signed-off-by: Maxim Patlasov <mpa...@pa...> --- fs/fuse/file.c | 55 +++++++++++++++++++++++++++++++++++++++++++++++++++++-- 1 files changed, 53 insertions(+), 2 deletions(-) diff --git a/fs/fuse/file.c b/fs/fuse/file.c index ef6d3de..3e0fdb7 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -2341,6 +2341,53 @@ int fuse_notify_poll_wakeup(struct fuse_conn *fc, return 0; } +static void fuse_do_truncate(struct file *file) +{ + struct fuse_file *ff = file->private_data; + struct inode *inode = file->f_mapping->host; + struct fuse_conn *fc = get_fuse_conn(inode); + struct fuse_req *req; + struct fuse_setattr_in inarg; + struct fuse_attr_out outarg; + int err; + + req = fuse_get_req_nopages(fc); + if (IS_ERR(req)) { + printk(KERN_WARNING "failed to allocate req for truncate " + "(%ld)\n", PTR_ERR(req)); + return; + } + + memset(&inarg, 0, sizeof(inarg)); + memset(&outarg, 0, sizeof(outarg)); + + inarg.valid |= FATTR_SIZE; + inarg.size = i_size_read(inode); + + inarg.valid |= FATTR_FH; + inarg.fh = ff->fh; + + req->in.h.opcode = FUSE_SETATTR; + req->in.h.nodeid = get_node_id(inode); + req->in.numargs = 1; + req->in.args[0].size = sizeof(inarg); + req->in.args[0].value = &inarg; + req->out.numargs = 1; + if (fc->minor < 9) + req->out.args[0].size = FUSE_COMPAT_ATTR_OUT_SIZE; + else + req->out.args[0].size = sizeof(outarg); + req->out.args[0].value = &outarg; + + fuse_request_send(fc, req); + err = req->out.h.error; + fuse_put_request(fc, req); + + if (err) + printk(KERN_WARNING "failed to truncate to %lld with error " + "%d\n", i_size_read(inode), err); +} + static ssize_t fuse_direct_IO(int rw, struct kiocb *iocb, const struct iovec *iov, loff_t offset, unsigned long nr_segs) @@ -2393,8 +2440,12 @@ fuse_direct_IO(int rw, struct kiocb *iocb, const struct iovec *iov, ret = wait_on_sync_kiocb(iocb); - if (rw == WRITE && ret > 0) - fuse_write_update_size(inode, pos); + if (rw == WRITE) { + if (ret > 0) + fuse_write_update_size(inode, pos); + else if (ret < 0 && offset + count > i_size) + fuse_do_truncate(file); + } } return ret; |
From: Maxim V. P. <MPa...@pa...> - 2012-12-10 07:42:21
|
If user requested direct read beyond EOF, we can skip sending fuse requests for positions beyond EOF because userspace would ACK them with zero bytes read anyway. We can trust to i_size in fuse_direct_IO for such cases because it's called from fuse_file_aio_read() and the latter updates fuse attributes including i_size. Signed-off-by: Maxim Patlasov <mpa...@pa...> --- fs/fuse/file.c | 19 +++++++++++++------ 1 files changed, 13 insertions(+), 6 deletions(-) diff --git a/fs/fuse/file.c b/fs/fuse/file.c index 3e0fdb7..d2094e1 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -1324,7 +1324,7 @@ EXPORT_SYMBOL_GPL(fuse_direct_io); static ssize_t __fuse_direct_read(struct file *file, const struct iovec *iov, unsigned long nr_segs, loff_t *ppos, - struct kiocb *async) + struct kiocb *async, size_t count) { ssize_t res; struct inode *inode = file->f_path.dentry->d_inode; @@ -1332,8 +1332,7 @@ static ssize_t __fuse_direct_read(struct file *file, const struct iovec *iov, if (is_bad_inode(inode)) return -EIO; - res = fuse_direct_io(file, iov, nr_segs, iov_length(iov, nr_segs), - ppos, 0, async); + res = fuse_direct_io(file, iov, nr_segs, count, ppos, 0, async); fuse_invalidate_attr(inode); @@ -1344,7 +1343,7 @@ static ssize_t fuse_direct_read(struct file *file, char __user *buf, size_t count, loff_t *ppos) { struct iovec iov = { .iov_base = buf, .iov_len = count }; - return __fuse_direct_read(file, &iov, 1, ppos, NULL); + return __fuse_direct_read(file, &iov, 1, ppos, NULL, count); } static ssize_t __fuse_direct_write(struct file *file, const struct iovec *iov, @@ -2405,8 +2404,15 @@ fuse_direct_IO(int rw, struct kiocb *iocb, const struct iovec *iov, inode = file->f_mapping->host; i_size = i_size_read(inode); + /* optimization for short read */ + if (rw != WRITE && offset + count > i_size) { + if (offset >= i_size) + return 0; + count = i_size - offset; + } + /* cannot write beyond eof asynchronously */ - if (is_sync_kiocb(iocb) || (offset + count <= i_size) || rw != WRITE) { + if (is_sync_kiocb(iocb) || (offset + count <= i_size)) { struct fuse_io_priv *io; io = kmalloc(sizeof(struct fuse_io_priv), GFP_KERNEL); @@ -2429,7 +2435,8 @@ fuse_direct_IO(int rw, struct kiocb *iocb, const struct iovec *iov, if (rw == WRITE) ret = __fuse_direct_write(file, iov, nr_segs, &pos, async_cb); else - ret = __fuse_direct_read(file, iov, nr_segs, &pos, async_cb); + ret = __fuse_direct_read(file, iov, nr_segs, &pos, async_cb, + count); if (async_cb) { fuse_aio_complete(async_cb->private, ret == count ? 0 : -EIO, |
From: Brian F. <bf...@re...> - 2012-12-11 20:32:27
|
On 12/10/2012 02:41 AM, Maxim V. Patlasov wrote: > Hi, > ... > --- > > Maxim V. Patlasov (6): > fuse: move fuse_release_user_pages() up > fuse: add support of async IO > fuse: make fuse_direct_io() aware about AIO > fuse: enable asynchronous processing direct IO > fuse: truncate file if async dio failed > fuse: optimize short direct reads > > > fs/fuse/cuse.c | 4 - > fs/fuse/file.c | 276 ++++++++++++++++++++++++++++++++++++++++++++++++------ > fs/fuse/fuse_i.h | 17 +++ > 3 files changed, 262 insertions(+), 35 deletions(-) > Hi Maxim, Thanks for posting this, first of all. I was reading through the code and found some of the logic a bit hard to follow, particularly due to controlling the behavior of underlying read/write functions based on the 'async' kiocb. For example: conditionally updating inode size in a couple different places; submitting sync requests async and waiting on them (causing the need to duplicate the update size call); and the existence of a kiocb called 'async' (when a kiocb can be either sync or async). Anyways, I couldn't quite put my finger on how to potentially clean that up without messing around with the code, so I've inlined the diff that I came up with that (IMO) cleans things up a bit. This should apply on top of your set and makes the following tweaks: - Always pass a kiocb down through __fuse_direct_write()/read() (instead of struct file). - Trigger the "async" behavior in fuse_direct_io() and fuse_send_read/write() based on the sync/async nature of the kiocb. This means that sync requests and async requests are sent as such based on the nature of the kiocb (as opposed to whether a kiocb exists). - Update the various other callers of __fuse_direct_write()/read and fuse_direct_io() to create and pass a sync kiocb where necessary. - Use the same approach in fuse_direct_IO() to turn an extending, async write into a sync write. The tradeoff with this approach is that we slightly uglify the callers that have to now create a sync kiocb. That said, I think some of those codepaths (i.e., fuse_file_aio_write()->fuse_perform_write()->...) could now just pass the kiocb from the vfs straight down. Thoughts? Brian This is only lightly tested: diff --git a/fs/fuse/cuse.c b/fs/fuse/cuse.c index beb99e9..89a8f4c 100644 --- a/fs/fuse/cuse.c +++ b/fs/fuse/cuse.c @@ -92,8 +92,11 @@ static ssize_t cuse_read(struct file *file, char __user *buf, size_t count, { loff_t pos = 0; struct iovec iov = { .iov_base = buf, .iov_len = count }; + struct kiocb iocb; - return fuse_direct_io(file, &iov, 1, count, &pos, 0, NULL); + init_sync_kiocb(&iocb, file); + + return fuse_direct_io(&iocb, &iov, 1, count, &pos, 0); } static ssize_t cuse_write(struct file *file, const char __user *buf, @@ -101,12 +104,15 @@ static ssize_t cuse_write(struct file *file, const char __user *buf, { loff_t pos = 0; struct iovec iov = { .iov_base = (void __user *)buf, .iov_len = count }; + struct kiocb iocb; + + init_sync_kiocb(&iocb, file); /* * No locking or generic_write_checks(), the server is * responsible for locking and sanity checks. */ - return fuse_direct_io(file, &iov, 1, count, &pos, 1, NULL); + return fuse_direct_io(&iocb, &iov, 1, count, &pos, 1); } static int cuse_open(struct inode *inode, struct file *file) diff --git a/fs/fuse/file.c b/fs/fuse/file.c index d2094e1..b07a745 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -597,10 +597,10 @@ static size_t fuse_async_req_send(struct fuse_conn *fc, struct fuse_req *req, return num_bytes; } -static size_t fuse_send_read(struct fuse_req *req, struct file *file, - loff_t pos, size_t count, fl_owner_t owner, - struct kiocb *async) +static size_t fuse_send_read(struct fuse_req *req, struct kiocb *iocb, + loff_t pos, size_t count, fl_owner_t owner) { + struct file *file = iocb->ki_filp; struct fuse_file *ff = file->private_data; struct fuse_conn *fc = ff->fc; @@ -612,8 +612,8 @@ static size_t fuse_send_read(struct fuse_req *req, struct file *file, inarg->lock_owner = fuse_lock_owner_id(fc, owner); } - if (async) - return fuse_async_req_send(fc, req, count, async); + if (!is_sync_kiocb(iocb)) + return fuse_async_req_send(fc, req, count, iocb); fuse_request_send(fc, req); return req->out.args[0].size; @@ -635,6 +635,7 @@ static void fuse_read_update_size(struct inode *inode, loff_t size, static int fuse_readpage(struct file *file, struct page *page) { + struct kiocb iocb; struct inode *inode = page->mapping->host; struct fuse_conn *fc = get_fuse_conn(inode); struct fuse_req *req; @@ -648,6 +649,8 @@ static int fuse_readpage(struct file *file, struct page *page) if (is_bad_inode(inode)) goto out; + init_sync_kiocb(&iocb, file); + /* * Page writeback can extend beyond the lifetime of the * page-cache page, so make sure we read a properly synced @@ -667,7 +670,7 @@ static int fuse_readpage(struct file *file, struct page *page) req->num_pages = 1; req->pages[0] = page; req->page_descs[0].length = count; - num_read = fuse_send_read(req, file, pos, count, NULL, NULL); + num_read = fuse_send_read(req, &iocb, pos, count, NULL); err = req->out.h.error; fuse_put_request(fc, req); @@ -869,10 +872,10 @@ static void fuse_write_fill(struct fuse_req *req, struct fuse_file *ff, req->out.args[0].value = outarg; } -static size_t fuse_send_write(struct fuse_req *req, struct file *file, - loff_t pos, size_t count, fl_owner_t owner, - struct kiocb *async) +static size_t fuse_send_write(struct fuse_req *req, struct kiocb *iocb, + loff_t pos, size_t count, fl_owner_t owner) { + struct file *file = iocb->ki_filp; struct fuse_file *ff = file->private_data; struct fuse_conn *fc = ff->fc; struct fuse_write_in *inarg = &req->misc.write.in; @@ -884,8 +887,8 @@ static size_t fuse_send_write(struct fuse_req *req, struct file *file, inarg->lock_owner = fuse_lock_owner_id(fc, owner); } - if (async) - return fuse_async_req_send(fc, req, count, async); + if (!is_sync_kiocb(iocb)) + return fuse_async_req_send(fc, req, count, iocb); fuse_request_send(fc, req); return req->misc.write.out.size; @@ -910,11 +913,14 @@ static size_t fuse_send_write_pages(struct fuse_req *req, struct file *file, size_t res; unsigned offset; unsigned i; + struct kiocb iocb; + + init_sync_kiocb(&iocb, file); for (i = 0; i < req->num_pages; i++) fuse_wait_on_page_writeback(inode, req->pages[i]->index); - res = fuse_send_write(req, file, pos, count, NULL, NULL); + res = fuse_send_write(req, &iocb, pos, count, NULL); offset = req->page_descs[0].offset; count = res; @@ -1252,10 +1258,11 @@ static inline int fuse_iter_npages(const struct iov_iter *ii_p) return min(npages, FUSE_MAX_PAGES_PER_REQ); } -ssize_t fuse_direct_io(struct file *file, const struct iovec *iov, +ssize_t fuse_direct_io(struct kiocb *iocb, const struct iovec *iov, unsigned long nr_segs, size_t count, loff_t *ppos, - int write, struct kiocb *async) + int write) { + struct file *file = iocb->ki_filp; struct fuse_file *ff = file->private_data; struct fuse_conn *fc = ff->fc; size_t nmax = write ? fc->max_write : fc->max_read; @@ -1276,7 +1283,7 @@ ssize_t fuse_direct_io(struct file *file, const struct iovec *iov, size_t nbytes = min(count, nmax); int err = fuse_get_user_pages(req, &ii, &nbytes, write); if (err) { - if (async) + if (!is_sync_kiocb(iocb)) fuse_put_request(fc, req); res = err; @@ -1284,13 +1291,11 @@ ssize_t fuse_direct_io(struct file *file, const struct iovec *iov, } if (write) - nres = fuse_send_write(req, file, pos, nbytes, owner, - async); + nres = fuse_send_write(req, iocb, pos, nbytes, owner); else - nres = fuse_send_read(req, file, pos, nbytes, owner, - async); + nres = fuse_send_read(req, iocb, pos, nbytes, owner); - if (!async) + if (is_sync_kiocb(iocb)) fuse_release_user_pages(req, !write); if (req->out.h.error) { if (!res) @@ -1306,14 +1311,14 @@ ssize_t fuse_direct_io(struct file *file, const struct iovec *iov, if (nres != nbytes) break; if (count) { - if (!async) + if (is_sync_kiocb(iocb)) fuse_put_request(fc, req); req = fuse_get_req(fc, fuse_iter_npages(&ii)); if (IS_ERR(req)) break; } } - if (!IS_ERR(req) && !async) + if (!IS_ERR(req) && is_sync_kiocb(iocb)) fuse_put_request(fc, req); if (res > 0) *ppos = pos; @@ -1322,17 +1327,18 @@ ssize_t fuse_direct_io(struct file *file, const struct iovec *iov, } EXPORT_SYMBOL_GPL(fuse_direct_io); -static ssize_t __fuse_direct_read(struct file *file, const struct iovec *iov, +static ssize_t __fuse_direct_read(struct kiocb *iocb, const struct iovec *iov, unsigned long nr_segs, loff_t *ppos, - struct kiocb *async, size_t count) + size_t count) { ssize_t res; + struct file *file = iocb->ki_filp; struct inode *inode = file->f_path.dentry->d_inode; if (is_bad_inode(inode)) return -EIO; - res = fuse_direct_io(file, iov, nr_segs, count, ppos, 0, async); + res = fuse_direct_io(iocb, iov, nr_segs, count, ppos, 0); fuse_invalidate_attr(inode); @@ -1342,23 +1348,24 @@ static ssize_t __fuse_direct_read(struct file *file, const struct iovec *iov, static ssize_t fuse_direct_read(struct file *file, char __user *buf, size_t count, loff_t *ppos) { + struct kiocb iocb; struct iovec iov = { .iov_base = buf, .iov_len = count }; - return __fuse_direct_read(file, &iov, 1, ppos, NULL, count); + init_sync_kiocb(&iocb, file); + return __fuse_direct_read(&iocb, &iov, 1, ppos, count); } -static ssize_t __fuse_direct_write(struct file *file, const struct iovec *iov, - unsigned long nr_segs, loff_t *ppos, - struct kiocb *async) +static ssize_t __fuse_direct_write(struct kiocb *iocb, const struct iovec *iov, + unsigned long nr_segs, loff_t *ppos) { + struct file *file = iocb->ki_filp; struct inode *inode = file->f_path.dentry->d_inode; size_t count = iov_length(iov, nr_segs); ssize_t res; res = generic_write_checks(file, ppos, &count, 0); if (!res) { - res = fuse_direct_io(file, iov, nr_segs, count, ppos, 1, - async); - if (!async && res > 0) + res = fuse_direct_io(iocb, iov, nr_segs, count, ppos, 1); + if (res > 0) fuse_write_update_size(inode, *ppos); } @@ -1373,13 +1380,16 @@ static ssize_t fuse_direct_write(struct file *file, const char __user *buf, struct iovec iov = { .iov_base = (void __user *)buf, .iov_len = count }; struct inode *inode = file->f_path.dentry->d_inode; ssize_t res; + struct kiocb iocb; if (is_bad_inode(inode)) return -EIO; - /* Don't allow parallel writes to the same file */ + init_sync_kiocb(&iocb, file); + + /* don't allow parallel writes to the same file */ mutex_lock(&inode->i_mutex); - res = __fuse_direct_write(file, &iov, 1, ppos, NULL); + res = __fuse_direct_write(&iocb, &iov, 1, ppos); mutex_unlock(&inode->i_mutex); return res; @@ -2397,7 +2407,7 @@ fuse_direct_IO(int rw, struct kiocb *iocb, const struct iovec *iov, struct inode *inode; loff_t i_size; size_t count = iov_length(iov, nr_segs); - struct kiocb *async_cb = NULL; + struct kiocb sync_iocb; file = iocb->ki_filp; pos = offset; @@ -2411,8 +2421,12 @@ fuse_direct_IO(int rw, struct kiocb *iocb, const struct iovec *iov, count = i_size - offset; } - /* cannot write beyond eof asynchronously */ - if (is_sync_kiocb(iocb) || (offset + count <= i_size)) { + if (!is_sync_kiocb(iocb) && (offset + count > i_size)) { + init_sync_kiocb(&sync_iocb, file); + iocb = &sync_iocb; + } + + if (!is_sync_kiocb(iocb)) { struct fuse_io_priv *io; io = kmalloc(sizeof(struct fuse_io_priv), GFP_KERNEL); @@ -2428,31 +2442,21 @@ fuse_direct_IO(int rw, struct kiocb *iocb, const struct iovec *iov, io->err = 0; io->iocb = iocb; iocb->private = io; - - async_cb = iocb; } if (rw == WRITE) - ret = __fuse_direct_write(file, iov, nr_segs, &pos, async_cb); + ret = __fuse_direct_write(iocb, iov, nr_segs, &pos); else - ret = __fuse_direct_read(file, iov, nr_segs, &pos, async_cb, - count); - - if (async_cb) { - fuse_aio_complete(async_cb->private, ret == count ? 0 : -EIO, - -1); + ret = __fuse_direct_read(iocb, iov, nr_segs, &pos, count); - if (!is_sync_kiocb(iocb)) - return -EIOCBQUEUED; - - ret = wait_on_sync_kiocb(iocb); + if (!is_sync_kiocb(iocb)) { + fuse_aio_complete(iocb->private, ret == count ? 0 : -EIO, -1); + return -EIOCBQUEUED; + } - if (rw == WRITE) { - if (ret > 0) - fuse_write_update_size(inode, pos); - else if (ret < 0 && offset + count > i_size) - fuse_do_truncate(file); - } + if (rw == WRITE) { + if (ret < 0 && offset + count > i_size) + fuse_do_truncate(file); } return ret; diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index 173c959..6639793 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -826,9 +826,9 @@ int fuse_reverse_inval_entry(struct super_block *sb, u64 parent_nodeid, int fuse_do_open(struct fuse_conn *fc, u64 nodeid, struct file *file, bool isdir); -ssize_t fuse_direct_io(struct file *file, const struct iovec *iov, +ssize_t fuse_direct_io(struct kiocb *iocb, const struct iovec *iov, unsigned long nr_segs, size_t count, loff_t *ppos, - int write, struct kiocb *async); + int write); long fuse_do_ioctl(struct file *file, unsigned int cmd, unsigned long arg, unsigned int flags); long fuse_ioctl_common(struct file *file, unsigned int cmd, |
From: Maxim V. P. <MPa...@pa...> - 2012-12-14 15:20:22
|
Hi, Existing fuse implementation always processes direct IO synchronously: it submits next request to userspace fuse only when previous is completed. This is suboptimal because: 1) libaio DIO works in blocking way; 2) userspace fuse can't achieve parallelism processing several requests simultaneously (e.g. in case of distributed network storage); 3) userspace fuse can't merge requests before passing it to actual storage. The idea of the patch-set is to submit fuse requests in non-blocking way (where it's possible) and either return -EIOCBQUEUED or wait for their completion synchronously. The patch-set to be applied on top of for-next of Miklos' git repo. To estimate performance improvement I used slightly modified fusexmp over tmpfs (clearing O_DIRECT bit from fi->flags in xmp_open). For synchronous operations I used 'dd' like this: dd of=/dev/null if=/fuse/mnt/file bs=2M count=256 iflag=direct dd if=/dev/zero of=/fuse/mnt/file bs=2M count=256 oflag=direct conv=notrunc For AIO I used 'aio-stress' like this: aio-stress -s 512 -a 4 -b 1 -c 1 -O -o 1 /fuse/mnt/file aio-stress -s 512 -a 4 -b 1 -c 1 -O -o 0 /fuse/mnt/file The throughput on some commodity (rather feeble) server was (in MB/sec): original / patched dd reads: ~322 / ~382 dd writes: ~277 / ~288 aio reads: ~380 / ~459 aio writes: ~319 / ~353 Changed in v2 - cleanups suggested by Brian: - Updated fuse_io_priv with an async field and file pointer to preserve the current style of interface (i.e., use this instead of iocb). - Trigger the type of request submission based on the async field. - Pulled up the fuse_write_update_size() call out of __fuse_direct_write() to make the separate paths more consistent. Thanks, Maxim --- Maxim V. Patlasov (6): fuse: move fuse_release_user_pages() up fuse: add support of async IO fuse: make fuse_direct_io() aware about AIO fuse: enable asynchronous processing direct IO fuse: truncate file if async dio failed fuse: optimize short direct reads fs/fuse/cuse.c | 6 + fs/fuse/file.c | 290 +++++++++++++++++++++++++++++++++++++++++++++++------- fs/fuse/fuse_i.h | 19 +++- 3 files changed, 276 insertions(+), 39 deletions(-) -- Signature |
From: Maxim V. P. <MPa...@pa...> - 2012-12-14 15:20:33
|
fuse_release_user_pages() will be indirectly used by fuse_send_read/write in future patches. Signed-off-by: Maxim Patlasov <mpa...@pa...> --- fs/fuse/file.c | 24 ++++++++++++------------ 1 files changed, 12 insertions(+), 12 deletions(-) diff --git a/fs/fuse/file.c b/fs/fuse/file.c index 19b50e7..6685cb0 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -491,6 +491,18 @@ void fuse_read_fill(struct fuse_req *req, struct file *file, loff_t pos, req->out.args[0].size = count; } +static void fuse_release_user_pages(struct fuse_req *req, int write) +{ + unsigned i; + + for (i = 0; i < req->num_pages; i++) { + struct page *page = req->pages[i]; + if (write) + set_page_dirty_lock(page); + put_page(page); + } +} + static size_t fuse_send_read(struct fuse_req *req, struct file *file, loff_t pos, size_t count, fl_owner_t owner) { @@ -1035,18 +1047,6 @@ out: return written ? written : err; } -static void fuse_release_user_pages(struct fuse_req *req, int write) -{ - unsigned i; - - for (i = 0; i < req->num_pages; i++) { - struct page *page = req->pages[i]; - if (write) - set_page_dirty_lock(page); - put_page(page); - } -} - static inline void fuse_page_descs_length_init(struct fuse_req *req, unsigned index, unsigned nr_pages) { |
From: Maxim V. P. <MPa...@pa...> - 2012-12-14 15:20:48
|
The patch implements a framework to process an IO request asynchronously. The idea is to associate several fuse requests with a single kiocb by means of fuse_io_priv structure. The structure plays the same role for FUSE as 'struct dio' for direct-io.c. The framework is supposed to be used like this: - someone (who wants to process an IO asynchronously) allocates fuse_io_priv and initializes it setting 'async' field to non-zero value. - as soon as fuse request is filled, it can be submitted (in non-blocking way) by fuse_async_req_send() - when all submitted requests are ACKed by userspace, io->reqs drops to zero triggering aio_complete() In case of IO initiated by libaio, aio_complete() will finish processing the same way as in case of dio_complete() calling aio_complete(). But the framework may be also used for internal FUSE use when initial IO request was synchronous (from user perspective), but it's beneficial to process it asynchronously. Then the caller should wait on kiocb explicitly and aio_complete() will wake the caller up. Signed-off-by: Maxim Patlasov <mpa...@pa...> --- fs/fuse/file.c | 92 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ fs/fuse/fuse_i.h | 17 ++++++++++ 2 files changed, 109 insertions(+), 0 deletions(-) diff --git a/fs/fuse/file.c b/fs/fuse/file.c index 6685cb0..8dd931f 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -503,6 +503,98 @@ static void fuse_release_user_pages(struct fuse_req *req, int write) } } +/** + * In case of short read, the caller sets 'pos' to the position of + * actual end of fuse request in IO request. Otherwise, if bytes_requested + * == bytes_transferred or rw == WRITE, the caller sets 'pos' to -1. + * + * An example: + * User requested DIO read of 64K. It was splitted into two 32K fuse requests, + * both submitted asynchronously. The first of them was ACKed by userspace as + * fully completed (req->out.args[0].size == 32K) resulting in pos == -1. The + * second request was ACKed as short, e.g. only 1K was read, resulting in + * pos == 33K. + * + * Thus, when all fuse requests are completed, the minimal non-negative 'pos' + * will be equal to the length of the longest contiguous fragment of + * transferred data starting from the beginning of IO request. + */ +static void fuse_aio_complete(struct fuse_io_priv *io, int err, ssize_t pos) +{ + int left; + + spin_lock(&io->lock); + if (err) + io->err = io->err ? : err; + else if (pos >= 0 && (io->bytes < 0 || pos < io->bytes)) + io->bytes = pos; + + left = --io->reqs; + spin_unlock(&io->lock); + + if (!left) { + long res; + + if (io->err) + res = io->err; + else if (io->bytes >= 0 && io->write) + res = -EIO; + else { + res = io->bytes < 0 ? io->size : io->bytes; + + if (!is_sync_kiocb(io->iocb)) { + struct path *path = &io->iocb->ki_filp->f_path; + struct inode *inode = path->dentry->d_inode; + struct fuse_conn *fc = get_fuse_conn(inode); + struct fuse_inode *fi = get_fuse_inode(inode); + + spin_lock(&fc->lock); + fi->attr_version = ++fc->attr_version; + spin_unlock(&fc->lock); + } + } + + aio_complete(io->iocb, res, 0); + kfree(io); + } +} + +static void fuse_aio_complete_req(struct fuse_conn *fc, struct fuse_req *req) +{ + struct fuse_io_priv *io = req->io; + ssize_t pos = -1; + + fuse_release_user_pages(req, !io->write); + + if (io->write) { + if (req->misc.write.in.size != req->misc.write.out.size) + pos = req->misc.write.in.offset - io->offset + + req->misc.write.out.size; + } else { + if (req->misc.read.in.size != req->out.args[0].size) + pos = req->misc.read.in.offset - io->offset + + req->out.args[0].size; + } + + fuse_aio_complete(io, req->out.h.error, pos); +} + +static size_t fuse_async_req_send(struct fuse_conn *fc, struct fuse_req *req, + size_t num_bytes, struct fuse_io_priv *io) +{ + spin_lock(&io->lock); + io->size += num_bytes; + io->reqs++; + spin_unlock(&io->lock); + + req->io = io; + req->end = fuse_aio_complete_req; + + fuse_request_send_background(fc, req); + + return num_bytes; +} + static size_t fuse_send_read(struct fuse_req *req, struct file *file, loff_t pos, size_t count, fl_owner_t owner) { diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index e4f70ea..e0a5b65 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -219,6 +219,20 @@ enum fuse_req_state { FUSE_REQ_FINISHED }; +/** The request IO state (for asynchronous processing) */ +struct fuse_io_priv { + int async; + spinlock_t lock; + unsigned reqs; + ssize_t bytes; + size_t size; + __u64 offset; + bool write; + int err; + struct kiocb *iocb; + struct file *file; +}; + /** * A request to the client */ @@ -323,6 +337,9 @@ struct fuse_req { /** Inode used in the request or NULL */ struct inode *inode; + /** AIO control block */ + struct fuse_io_priv *io; + /** Link on fi->writepages */ struct list_head writepages_entry; |
From: Maxim V. P. <MPa...@pa...> - 2012-12-14 15:20:58
|
The patch implements passing "struct fuse_io_priv *io" down the stack up to fuse_send_read/write where it is used to submit request asynchronously. io->async==0 designates synchronous processing. Non-trivial part of the patch is changes in fuse_direct_io(): resources like fuse requests and user pages cannot be released immediately in async case. Signed-off-by: Maxim Patlasov <mpa...@pa...> --- fs/fuse/cuse.c | 6 +++-- fs/fuse/file.c | 69 +++++++++++++++++++++++++++++++++++++++--------------- fs/fuse/fuse_i.h | 2 +- 3 files changed, 55 insertions(+), 22 deletions(-) diff --git a/fs/fuse/cuse.c b/fs/fuse/cuse.c index 65ce10a..d890901 100644 --- a/fs/fuse/cuse.c +++ b/fs/fuse/cuse.c @@ -92,8 +92,9 @@ static ssize_t cuse_read(struct file *file, char __user *buf, size_t count, { loff_t pos = 0; struct iovec iov = { .iov_base = buf, .iov_len = count }; + struct fuse_io_priv io = { .async = 0, .file = file }; - return fuse_direct_io(file, &iov, 1, count, &pos, 0); + return fuse_direct_io(&io, &iov, 1, count, &pos, 0); } static ssize_t cuse_write(struct file *file, const char __user *buf, @@ -101,12 +102,13 @@ static ssize_t cuse_write(struct file *file, const char __user *buf, { loff_t pos = 0; struct iovec iov = { .iov_base = (void __user *)buf, .iov_len = count }; + struct fuse_io_priv io = { .async = 0, .file = file }; /* * No locking or generic_write_checks(), the server is * responsible for locking and sanity checks. */ - return fuse_direct_io(file, &iov, 1, count, &pos, 1); + return fuse_direct_io(&io, &iov, 1, count, &pos, 1); } static int cuse_open(struct inode *inode, struct file *file) diff --git a/fs/fuse/file.c b/fs/fuse/file.c index 8dd931f..6c2ca8a 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -595,9 +595,10 @@ static size_t fuse_async_req_send(struct fuse_conn *fc, struct fuse_req *req, return num_bytes; } -static size_t fuse_send_read(struct fuse_req *req, struct file *file, +static size_t fuse_send_read(struct fuse_req *req, struct fuse_io_priv *io, loff_t pos, size_t count, fl_owner_t owner) { + struct file *file = io->file; struct fuse_file *ff = file->private_data; struct fuse_conn *fc = ff->fc; @@ -608,6 +609,10 @@ static size_t fuse_send_read(struct fuse_req *req, struct file *file, inarg->read_flags |= FUSE_READ_LOCKOWNER; inarg->lock_owner = fuse_lock_owner_id(fc, owner); } + + if (io->async) + return fuse_async_req_send(fc, req, count, io); + fuse_request_send(fc, req); return req->out.args[0].size; } @@ -628,6 +633,7 @@ static void fuse_read_update_size(struct inode *inode, loff_t size, static int fuse_readpage(struct file *file, struct page *page) { + struct fuse_io_priv io = { .async = 0, .file = file }; struct inode *inode = page->mapping->host; struct fuse_conn *fc = get_fuse_conn(inode); struct fuse_req *req; @@ -660,7 +666,7 @@ static int fuse_readpage(struct file *file, struct page *page) req->num_pages = 1; req->pages[0] = page; req->page_descs[0].length = count; - num_read = fuse_send_read(req, file, pos, count, NULL); + num_read = fuse_send_read(req, &io, pos, count, NULL); err = req->out.h.error; fuse_put_request(fc, req); @@ -862,9 +868,10 @@ static void fuse_write_fill(struct fuse_req *req, struct fuse_file *ff, req->out.args[0].value = outarg; } -static size_t fuse_send_write(struct fuse_req *req, struct file *file, +static size_t fuse_send_write(struct fuse_req *req, struct fuse_io_priv *io, loff_t pos, size_t count, fl_owner_t owner) { + struct file *file = io->file; struct fuse_file *ff = file->private_data; struct fuse_conn *fc = ff->fc; struct fuse_write_in *inarg = &req->misc.write.in; @@ -875,6 +882,10 @@ static size_t fuse_send_write(struct fuse_req *req, struct file *file, inarg->write_flags |= FUSE_WRITE_LOCKOWNER; inarg->lock_owner = fuse_lock_owner_id(fc, owner); } + + if (io->async) + return fuse_async_req_send(fc, req, count, io); + fuse_request_send(fc, req); return req->misc.write.out.size; } @@ -898,11 +909,12 @@ static size_t fuse_send_write_pages(struct fuse_req *req, struct file *file, size_t res; unsigned offset; unsigned i; + struct fuse_io_priv io = { .async = 0, .file = file }; for (i = 0; i < req->num_pages; i++) fuse_wait_on_page_writeback(inode, req->pages[i]->index); - res = fuse_send_write(req, file, pos, count, NULL); + res = fuse_send_write(req, &io, pos, count, NULL); offset = req->page_descs[0].offset; count = res; @@ -1240,10 +1252,11 @@ static inline int fuse_iter_npages(const struct iov_iter *ii_p) return min(npages, FUSE_MAX_PAGES_PER_REQ); } -ssize_t fuse_direct_io(struct file *file, const struct iovec *iov, +ssize_t fuse_direct_io(struct fuse_io_priv *io, const struct iovec *iov, unsigned long nr_segs, size_t count, loff_t *ppos, int write) { + struct file *file = io->file; struct fuse_file *ff = file->private_data; struct fuse_conn *fc = ff->fc; size_t nmax = write ? fc->max_write : fc->max_read; @@ -1264,16 +1277,20 @@ ssize_t fuse_direct_io(struct file *file, const struct iovec *iov, size_t nbytes = min(count, nmax); int err = fuse_get_user_pages(req, &ii, &nbytes, write); if (err) { + if (io->async) + fuse_put_request(fc, req); + res = err; break; } if (write) - nres = fuse_send_write(req, file, pos, nbytes, owner); + nres = fuse_send_write(req, io, pos, nbytes, owner); else - nres = fuse_send_read(req, file, pos, nbytes, owner); + nres = fuse_send_read(req, io, pos, nbytes, owner); - fuse_release_user_pages(req, !write); + if (!io->async) + fuse_release_user_pages(req, !write); if (req->out.h.error) { if (!res) res = req->out.h.error; @@ -1288,13 +1305,14 @@ ssize_t fuse_direct_io(struct file *file, const struct iovec *iov, if (nres != nbytes) break; if (count) { - fuse_put_request(fc, req); + if (!io->async) + fuse_put_request(fc, req); req = fuse_get_req(fc, fuse_iter_npages(&ii)); if (IS_ERR(req)) break; } } - if (!IS_ERR(req)) + if (!IS_ERR(req) && !io->async) fuse_put_request(fc, req); if (res > 0) *ppos = pos; @@ -1303,16 +1321,17 @@ ssize_t fuse_direct_io(struct file *file, const struct iovec *iov, } EXPORT_SYMBOL_GPL(fuse_direct_io); -static ssize_t __fuse_direct_read(struct file *file, const struct iovec *iov, +static ssize_t __fuse_direct_read(struct fuse_io_priv *io, const struct iovec *iov, unsigned long nr_segs, loff_t *ppos) { ssize_t res; + struct file *file = io->file; struct inode *inode = file->f_path.dentry->d_inode; if (is_bad_inode(inode)) return -EIO; - res = fuse_direct_io(file, iov, nr_segs, iov_length(iov, nr_segs), + res = fuse_direct_io(io, iov, nr_segs, iov_length(iov, nr_segs), ppos, 0); fuse_invalidate_attr(inode); @@ -1323,21 +1342,23 @@ static ssize_t __fuse_direct_read(struct file *file, const struct iovec *iov, static ssize_t fuse_direct_read(struct file *file, char __user *buf, size_t count, loff_t *ppos) { + struct fuse_io_priv io = { .async = 0, .file = file }; struct iovec iov = { .iov_base = buf, .iov_len = count }; - return __fuse_direct_read(file, &iov, 1, ppos); + return __fuse_direct_read(&io, &iov, 1, ppos); } -static ssize_t __fuse_direct_write(struct file *file, const struct iovec *iov, +static ssize_t __fuse_direct_write(struct fuse_io_priv *io, const struct iovec *iov, unsigned long nr_segs, loff_t *ppos) { + struct file *file = io->file; struct inode *inode = file->f_path.dentry->d_inode; size_t count = iov_length(iov, nr_segs); ssize_t res; res = generic_write_checks(file, ppos, &count, 0); if (!res) { - res = fuse_direct_io(file, iov, nr_segs, count, ppos, 1); - if (res > 0) + res = fuse_direct_io(io, iov, nr_segs, count, ppos, 1); + if (!io->async && res > 0) fuse_write_update_size(inode, *ppos); } @@ -1352,13 +1373,14 @@ static ssize_t fuse_direct_write(struct file *file, const char __user *buf, struct iovec iov = { .iov_base = (void __user *)buf, .iov_len = count }; struct inode *inode = file->f_path.dentry->d_inode; ssize_t res; + struct fuse_io_priv io = { .async = 0, .file = file }; if (is_bad_inode(inode)) return -EIO; /* Don't allow parallel writes to the same file */ mutex_lock(&inode->i_mutex); - res = __fuse_direct_write(file, &iov, 1, ppos); + res = __fuse_direct_write(&io, &iov, 1, ppos); mutex_unlock(&inode->i_mutex); return res; @@ -2326,14 +2348,23 @@ fuse_direct_IO(int rw, struct kiocb *iocb, const struct iovec *iov, ssize_t ret = 0; struct file *file = NULL; loff_t pos = 0; + struct fuse_io_priv *io; file = iocb->ki_filp; pos = offset; + io = kzalloc(sizeof(struct fuse_io_priv), GFP_KERNEL); + if (!io) + return -ENOMEM; + + io->file = file; + if (rw == WRITE) - ret = __fuse_direct_write(file, iov, nr_segs, &pos); + ret = __fuse_direct_write(io, iov, nr_segs, &pos); else - ret = __fuse_direct_read(file, iov, nr_segs, &pos); + ret = __fuse_direct_read(io, iov, nr_segs, &pos); + + kfree(io); return ret; } diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index e0a5b65..91b5192 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -828,7 +828,7 @@ int fuse_reverse_inval_entry(struct super_block *sb, u64 parent_nodeid, int fuse_do_open(struct fuse_conn *fc, u64 nodeid, struct file *file, bool isdir); -ssize_t fuse_direct_io(struct file *file, const struct iovec *iov, +ssize_t fuse_direct_io(struct fuse_io_priv *io, const struct iovec *iov, unsigned long nr_segs, size_t count, loff_t *ppos, int write); long fuse_do_ioctl(struct file *file, unsigned int cmd, unsigned long arg, |
From: Maxim V. P. <MPa...@pa...> - 2012-12-14 15:21:15
|
In case of synchronous DIO request (i.e. read(2) or write(2) for a file opened with O_DIRECT), the patch submits fuse requests asynchronously, but waits for their completions before return from fuse_direct_IO(). In case of asynchronous DIO request (i.e. libaio io_submit() or a file opened with O_DIRECT), the patch submits fuse requests asynchronously and return -EIOCBQUEUED immediately. The only special case is async DIO extending file. Here the patch falls back to old behaviour because we can't return -EIOCBQUEUED and update i_size later, without i_mutex hold. And we have no method to wait on real async I/O requests. The patch also clean __fuse_direct_write() up: it's better to update i_size in its callers. Thanks Brian for suggestion. Signed-off-by: Maxim Patlasov <mpa...@pa...> --- fs/fuse/file.c | 51 ++++++++++++++++++++++++++++++++++++++++++++------- 1 files changed, 44 insertions(+), 7 deletions(-) diff --git a/fs/fuse/file.c b/fs/fuse/file.c index 6c2ca8a..05eed23 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -1356,11 +1356,8 @@ static ssize_t __fuse_direct_write(struct fuse_io_priv *io, const struct iovec * ssize_t res; res = generic_write_checks(file, ppos, &count, 0); - if (!res) { + if (!res) res = fuse_direct_io(io, iov, nr_segs, count, ppos, 1); - if (!io->async && res > 0) - fuse_write_update_size(inode, *ppos); - } fuse_invalidate_attr(inode); @@ -1381,6 +1378,8 @@ static ssize_t fuse_direct_write(struct file *file, const char __user *buf, /* Don't allow parallel writes to the same file */ mutex_lock(&inode->i_mutex); res = __fuse_direct_write(&io, &iov, 1, ppos); + if (res > 0) + fuse_write_update_size(inode, *ppos); mutex_unlock(&inode->i_mutex); return res; @@ -2348,23 +2347,61 @@ fuse_direct_IO(int rw, struct kiocb *iocb, const struct iovec *iov, ssize_t ret = 0; struct file *file = NULL; loff_t pos = 0; + struct inode *inode; + loff_t i_size; + size_t count = iov_length(iov, nr_segs); struct fuse_io_priv *io; file = iocb->ki_filp; pos = offset; + inode = file->f_mapping->host; + i_size = i_size_read(inode); - io = kzalloc(sizeof(struct fuse_io_priv), GFP_KERNEL); + io = kmalloc(sizeof(struct fuse_io_priv), GFP_KERNEL); if (!io) return -ENOMEM; - + spin_lock_init(&io->lock); + io->reqs = 1; + io->bytes = -1; + io->size = 0; + io->offset = offset; + io->write = (rw == WRITE); + io->err = 0; io->file = file; + /* + * By default, we want to optimize all I/Os with async request submission + * to the client filesystem. + */ + io->async = 1; + io->iocb = iocb; + + /* + * We cannot asynchronously extend the size of a file. We have no method + * to wait on real async I/O requests, so we must submit this request + * synchronously. + */ + if (!is_sync_kiocb(iocb) && (offset + count > i_size) && rw == WRITE) + io->async = 0; if (rw == WRITE) ret = __fuse_direct_write(io, iov, nr_segs, &pos); else ret = __fuse_direct_read(io, iov, nr_segs, &pos); - kfree(io); + if (io->async) { + fuse_aio_complete(io, ret == count ? 0 : -EIO, -1); + + /* we have a non-extending, async request, so return */ + if (!is_sync_kiocb(iocb)) + return -EIOCBQUEUED; + + ret = wait_on_sync_kiocb(iocb); + } else { + kfree(io); + } + + if (rw == WRITE && ret > 0) + fuse_write_update_size(inode, pos); return ret; } |
From: Maxim V. P. <MPa...@pa...> - 2012-12-14 15:21:33
|
The patch improves error handling in fuse_direct_IO(): if we successfully submitted several fuse requests on behalf of synchronous direct write extending file and some of them failed, let's try to do our best to clean-up. Signed-off-by: Maxim Patlasov <mpa...@pa...> --- fs/fuse/file.c | 55 +++++++++++++++++++++++++++++++++++++++++++++++++++++-- 1 files changed, 53 insertions(+), 2 deletions(-) diff --git a/fs/fuse/file.c b/fs/fuse/file.c index 05eed23..b6e9b8d 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -2340,6 +2340,53 @@ int fuse_notify_poll_wakeup(struct fuse_conn *fc, return 0; } +static void fuse_do_truncate(struct file *file) +{ + struct fuse_file *ff = file->private_data; + struct inode *inode = file->f_mapping->host; + struct fuse_conn *fc = get_fuse_conn(inode); + struct fuse_req *req; + struct fuse_setattr_in inarg; + struct fuse_attr_out outarg; + int err; + + req = fuse_get_req_nopages(fc); + if (IS_ERR(req)) { + printk(KERN_WARNING "failed to allocate req for truncate " + "(%ld)\n", PTR_ERR(req)); + return; + } + + memset(&inarg, 0, sizeof(inarg)); + memset(&outarg, 0, sizeof(outarg)); + + inarg.valid |= FATTR_SIZE; + inarg.size = i_size_read(inode); + + inarg.valid |= FATTR_FH; + inarg.fh = ff->fh; + + req->in.h.opcode = FUSE_SETATTR; + req->in.h.nodeid = get_node_id(inode); + req->in.numargs = 1; + req->in.args[0].size = sizeof(inarg); + req->in.args[0].value = &inarg; + req->out.numargs = 1; + if (fc->minor < 9) + req->out.args[0].size = FUSE_COMPAT_ATTR_OUT_SIZE; + else + req->out.args[0].size = sizeof(outarg); + req->out.args[0].value = &outarg; + + fuse_request_send(fc, req); + err = req->out.h.error; + fuse_put_request(fc, req); + + if (err) + printk(KERN_WARNING "failed to truncate to %lld with error " + "%d\n", i_size_read(inode), err); +} + static ssize_t fuse_direct_IO(int rw, struct kiocb *iocb, const struct iovec *iov, loff_t offset, unsigned long nr_segs) @@ -2400,8 +2447,12 @@ fuse_direct_IO(int rw, struct kiocb *iocb, const struct iovec *iov, kfree(io); } - if (rw == WRITE && ret > 0) - fuse_write_update_size(inode, pos); + if (rw == WRITE) { + if (ret > 0) + fuse_write_update_size(inode, pos); + else if (ret < 0 && offset + count > i_size) + fuse_do_truncate(file); + } return ret; } |
From: Maxim V. P. <MPa...@pa...> - 2012-12-18 10:05:03
|
The patch improves error handling in fuse_direct_IO(): if we successfully submitted several fuse requests on behalf of synchronous direct write extending file and some of them failed, let's try to do our best to clean-up. Changed in v2: reuse fuse_do_setattr(). Thanks to Brian for suggestion. Signed-off-by: Maxim Patlasov <mpa...@pa...> --- fs/fuse/dir.c | 17 +++++++++-------- fs/fuse/file.c | 27 +++++++++++++++++++++++++-- fs/fuse/fuse_i.h | 3 +++ 3 files changed, 37 insertions(+), 10 deletions(-) diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c index 20b52a5..049d4c2 100644 --- a/fs/fuse/dir.c +++ b/fs/fuse/dir.c @@ -1532,10 +1532,9 @@ void fuse_release_nowrite(struct inode *inode) * vmtruncate() doesn't allow for this case, so do the rlimit checking * and the actual truncation by hand. */ -static int fuse_do_setattr(struct dentry *entry, struct iattr *attr, - struct file *file) +int fuse_do_setattr(struct inode *inode, struct iattr *attr, + struct file *file) { - struct inode *inode = entry->d_inode; struct fuse_conn *fc = get_fuse_conn(inode); struct fuse_req *req; struct fuse_setattr_in inarg; @@ -1544,9 +1543,6 @@ static int fuse_do_setattr(struct dentry *entry, struct iattr *attr, loff_t oldsize; int err; - if (!fuse_allow_task(fc, current)) - return -EACCES; - if (!(fc->flags & FUSE_DEFAULT_PERMISSIONS)) attr->ia_valid |= ATTR_FORCE; @@ -1641,10 +1637,15 @@ error: static int fuse_setattr(struct dentry *entry, struct iattr *attr) { + struct inode *inode = entry->d_inode; + + if (!fuse_allow_task(get_fuse_conn(inode), current)) + return -EACCES; + if (attr->ia_valid & ATTR_FILE) - return fuse_do_setattr(entry, attr, attr->ia_file); + return fuse_do_setattr(inode, attr, attr->ia_file); else - return fuse_do_setattr(entry, attr, NULL); + return fuse_do_setattr(inode, attr, NULL); } static int fuse_getattr(struct vfsmount *mnt, struct dentry *entry, diff --git a/fs/fuse/file.c b/fs/fuse/file.c index 05eed23..d9a0568 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -2340,6 +2340,25 @@ int fuse_notify_poll_wakeup(struct fuse_conn *fc, return 0; } +static void fuse_do_truncate(struct file *file) +{ + struct inode *inode = file->f_mapping->host; + struct iattr attr; + int err; + + attr.ia_valid = ATTR_SIZE; + attr.ia_size = i_size_read(inode); + + attr.ia_file = file; + attr.ia_valid |= ATTR_FILE; + + err = fuse_do_setattr(inode, &attr, file); + + if (err) + printk(KERN_WARNING "failed to truncate to %lld with error " + "%d\n", i_size_read(inode), err); +} + static ssize_t fuse_direct_IO(int rw, struct kiocb *iocb, const struct iovec *iov, loff_t offset, unsigned long nr_segs) @@ -2400,8 +2419,12 @@ fuse_direct_IO(int rw, struct kiocb *iocb, const struct iovec *iov, kfree(io); } - if (rw == WRITE && ret > 0) - fuse_write_update_size(inode, pos); + if (rw == WRITE) { + if (ret > 0) + fuse_write_update_size(inode, pos); + else if (ret < 0 && offset + count > i_size) + fuse_do_truncate(file); + } return ret; } diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index 91b5192..d4f7f07 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -840,4 +840,7 @@ int fuse_dev_release(struct inode *inode, struct file *file); void fuse_write_update_size(struct inode *inode, loff_t pos); +int fuse_do_setattr(struct inode *inode, struct iattr *attr, + struct file *file); + #endif /* _FS_FUSE_I_H */ |
From: Maxim V. P. <MPa...@pa...> - 2012-12-14 15:21:33
|
If user requested direct read beyond EOF, we can skip sending fuse requests for positions beyond EOF because userspace would ACK them with zero bytes read anyway. We can trust to i_size in fuse_direct_IO for such cases because it's called from fuse_file_aio_read() and the latter updates fuse attributes including i_size. Signed-off-by: Maxim Patlasov <mpa...@pa...> --- fs/fuse/file.c | 19 +++++++++++++------ 1 files changed, 13 insertions(+), 6 deletions(-) diff --git a/fs/fuse/file.c b/fs/fuse/file.c index b6e9b8d..ceacd20 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -1322,7 +1322,8 @@ ssize_t fuse_direct_io(struct fuse_io_priv *io, const struct iovec *iov, EXPORT_SYMBOL_GPL(fuse_direct_io); static ssize_t __fuse_direct_read(struct fuse_io_priv *io, const struct iovec *iov, - unsigned long nr_segs, loff_t *ppos) + unsigned long nr_segs, loff_t *ppos, + size_t count) { ssize_t res; struct file *file = io->file; @@ -1331,8 +1332,7 @@ static ssize_t __fuse_direct_read(struct fuse_io_priv *io, const struct iovec *i if (is_bad_inode(inode)) return -EIO; - res = fuse_direct_io(io, iov, nr_segs, iov_length(iov, nr_segs), - ppos, 0); + res = fuse_direct_io(io, iov, nr_segs, count, ppos, 0); fuse_invalidate_attr(inode); @@ -1344,7 +1344,7 @@ static ssize_t fuse_direct_read(struct file *file, char __user *buf, { struct fuse_io_priv io = { .async = 0, .file = file }; struct iovec iov = { .iov_base = buf, .iov_len = count }; - return __fuse_direct_read(&io, &iov, 1, ppos); + return __fuse_direct_read(&io, &iov, 1, ppos, count); } static ssize_t __fuse_direct_write(struct fuse_io_priv *io, const struct iovec *iov, @@ -2404,6 +2404,13 @@ fuse_direct_IO(int rw, struct kiocb *iocb, const struct iovec *iov, inode = file->f_mapping->host; i_size = i_size_read(inode); + /* optimization for short read */ + if (rw != WRITE && offset + count > i_size) { + if (offset >= i_size) + return 0; + count = i_size - offset; + } + io = kmalloc(sizeof(struct fuse_io_priv), GFP_KERNEL); if (!io) return -ENOMEM; @@ -2427,13 +2434,13 @@ fuse_direct_IO(int rw, struct kiocb *iocb, const struct iovec *iov, * to wait on real async I/O requests, so we must submit this request * synchronously. */ - if (!is_sync_kiocb(iocb) && (offset + count > i_size) && rw == WRITE) + if (!is_sync_kiocb(iocb) && (offset + count > i_size)) io->async = 0; if (rw == WRITE) ret = __fuse_direct_write(io, iov, nr_segs, &pos); else - ret = __fuse_direct_read(io, iov, nr_segs, &pos); + ret = __fuse_direct_read(io, iov, nr_segs, &pos, count); if (io->async) { fuse_aio_complete(io, ret == count ? 0 : -EIO, -1); |
From: Brian F. <bf...@re...> - 2012-12-14 20:18:55
|
On 12/14/2012 10:21 AM, Maxim V. Patlasov wrote: > The patch improves error handling in fuse_direct_IO(): if we successfully > submitted several fuse requests on behalf of synchronous direct write > extending file and some of them failed, let's try to do our best to clean-up. > > Signed-off-by: Maxim Patlasov <mpa...@pa...> > --- > fs/fuse/file.c | 55 +++++++++++++++++++++++++++++++++++++++++++++++++++++-- > 1 files changed, 53 insertions(+), 2 deletions(-) > > diff --git a/fs/fuse/file.c b/fs/fuse/file.c > index 05eed23..b6e9b8d 100644 > --- a/fs/fuse/file.c > +++ b/fs/fuse/file.c > @@ -2340,6 +2340,53 @@ int fuse_notify_poll_wakeup(struct fuse_conn *fc, > return 0; > } > > +static void fuse_do_truncate(struct file *file) > +{ > + struct fuse_file *ff = file->private_data; > + struct inode *inode = file->f_mapping->host; > + struct fuse_conn *fc = get_fuse_conn(inode); > + struct fuse_req *req; > + struct fuse_setattr_in inarg; > + struct fuse_attr_out outarg; > + int err; > + > + req = fuse_get_req_nopages(fc); > + if (IS_ERR(req)) { > + printk(KERN_WARNING "failed to allocate req for truncate " > + "(%ld)\n", PTR_ERR(req)); > + return; > + } > + > + memset(&inarg, 0, sizeof(inarg)); > + memset(&outarg, 0, sizeof(outarg)); > + > + inarg.valid |= FATTR_SIZE; > + inarg.size = i_size_read(inode); > + > + inarg.valid |= FATTR_FH; > + inarg.fh = ff->fh; > + > + req->in.h.opcode = FUSE_SETATTR; > + req->in.h.nodeid = get_node_id(inode); > + req->in.numargs = 1; > + req->in.args[0].size = sizeof(inarg); > + req->in.args[0].value = &inarg; > + req->out.numargs = 1; > + if (fc->minor < 9) > + req->out.args[0].size = FUSE_COMPAT_ATTR_OUT_SIZE; > + else > + req->out.args[0].size = sizeof(outarg); > + req->out.args[0].value = &outarg; > + > + fuse_request_send(fc, req); > + err = req->out.h.error; > + fuse_put_request(fc, req); > + > + if (err) > + printk(KERN_WARNING "failed to truncate to %lld with error " > + "%d\n", i_size_read(inode), err); > +} > + fuse_do_truncate() looks fairly close to fuse_do_setattr(). Is there any reason we couldn't make fuse_do_setattr() non-static, change the dentry parameter to an inode and use that? Brian > static ssize_t > fuse_direct_IO(int rw, struct kiocb *iocb, const struct iovec *iov, > loff_t offset, unsigned long nr_segs) > @@ -2400,8 +2447,12 @@ fuse_direct_IO(int rw, struct kiocb *iocb, const struct iovec *iov, > kfree(io); > } > > - if (rw == WRITE && ret > 0) > - fuse_write_update_size(inode, pos); > + if (rw == WRITE) { > + if (ret > 0) > + fuse_write_update_size(inode, pos); > + else if (ret < 0 && offset + count > i_size) > + fuse_do_truncate(file); > + } > > return ret; > } > |
From: Maxim V. P. <mpa...@pa...> - 2012-12-17 14:13:37
|
Hi, 12/15/2012 12:16 AM, Brian Foster пишет: > On 12/14/2012 10:21 AM, Maxim V. Patlasov wrote: >> The patch improves error handling in fuse_direct_IO(): if we successfully >> submitted several fuse requests on behalf of synchronous direct write >> extending file and some of them failed, let's try to do our best to clean-up. >> >> Signed-off-by: Maxim Patlasov <mpa...@pa...> >> --- >> fs/fuse/file.c | 55 +++++++++++++++++++++++++++++++++++++++++++++++++++++-- >> 1 files changed, 53 insertions(+), 2 deletions(-) >> >> diff --git a/fs/fuse/file.c b/fs/fuse/file.c >> index 05eed23..b6e9b8d 100644 >> --- a/fs/fuse/file.c >> +++ b/fs/fuse/file.c >> @@ -2340,6 +2340,53 @@ int fuse_notify_poll_wakeup(struct fuse_conn *fc, >> return 0; >> } >> >> +static void fuse_do_truncate(struct file *file) >> +{ >> + struct fuse_file *ff = file->private_data; >> + struct inode *inode = file->f_mapping->host; >> + struct fuse_conn *fc = get_fuse_conn(inode); >> + struct fuse_req *req; >> + struct fuse_setattr_in inarg; >> + struct fuse_attr_out outarg; >> + int err; >> + >> + req = fuse_get_req_nopages(fc); >> + if (IS_ERR(req)) { >> + printk(KERN_WARNING "failed to allocate req for truncate " >> + "(%ld)\n", PTR_ERR(req)); >> + return; >> + } >> + >> + memset(&inarg, 0, sizeof(inarg)); >> + memset(&outarg, 0, sizeof(outarg)); >> + >> + inarg.valid |= FATTR_SIZE; >> + inarg.size = i_size_read(inode); >> + >> + inarg.valid |= FATTR_FH; >> + inarg.fh = ff->fh; >> + >> + req->in.h.opcode = FUSE_SETATTR; >> + req->in.h.nodeid = get_node_id(inode); >> + req->in.numargs = 1; >> + req->in.args[0].size = sizeof(inarg); >> + req->in.args[0].value = &inarg; >> + req->out.numargs = 1; >> + if (fc->minor < 9) >> + req->out.args[0].size = FUSE_COMPAT_ATTR_OUT_SIZE; >> + else >> + req->out.args[0].size = sizeof(outarg); >> + req->out.args[0].value = &outarg; >> + >> + fuse_request_send(fc, req); >> + err = req->out.h.error; >> + fuse_put_request(fc, req); >> + >> + if (err) >> + printk(KERN_WARNING "failed to truncate to %lld with error " >> + "%d\n", i_size_read(inode), err); >> +} >> + > fuse_do_truncate() looks fairly close to fuse_do_setattr(). Is there any > reason we couldn't make fuse_do_setattr() non-static, change the dentry > parameter to an inode and use that? fuse_do_setattr() performs extra checks that fuse_do_truncate() needn't. Some of them are harmless, some not: fuse_allow_task() may return 0 if task credentials changed. E.g. super-user successfully opened a file, then setuid(other_user_uid), then write(2) to the file. write(2) doesn't check uid, but fuse_do_truncate() - via fuse_allow_task() - does. This non-POSIX behaviour (ftruncate(2) returning -1 with errno==EACCES) was introduced long time ago: > commit e57ac68378a287d6336d187b26971f35f7ee7251 > Author: Miklos Szeredi <msz...@su...> > Date: Thu Oct 18 03:06:58 2007 -0700 > > fuse: fix allowing operations > > The following operation didn't check if sending the request was > allowed: > > setattr > listxattr > statfs > > Some other operations don't explicitly do the check, but VFS calls > ->permission() which checks this. > > Signed-off-by: Miklos Szeredi <msz...@su...> > Signed-off-by: Andrew Morton <ak...@li...> > Signed-off-by: Linus Torvalds <tor...@li...> and I'm not sure whether it was done intentionally or not. Maybe Miklos could shed some light on it... Thanks, Maxim |
From: Brian F. <bf...@re...> - 2012-12-17 19:02:42
|
On 12/17/2012 09:13 AM, Maxim V. Patlasov wrote: > Hi, > > 12/15/2012 12:16 AM, Brian Foster пишет: >> On 12/14/2012 10:21 AM, Maxim V. Patlasov wrote: ... >>> + >> fuse_do_truncate() looks fairly close to fuse_do_setattr(). Is there any >> reason we couldn't make fuse_do_setattr() non-static, change the dentry >> parameter to an inode and use that? > > fuse_do_setattr() performs extra checks that fuse_do_truncate() needn't. > Some of them are harmless, some not: fuse_allow_task() may return 0 if > task credentials changed. E.g. super-user successfully opened a file, > then setuid(other_user_uid), then write(2) to the file. write(2) doesn't > check uid, but fuse_do_truncate() - via fuse_allow_task() - does. > Conversely, what about the extra error handling bits in fuse_do_setattr() that do not appear in fuse_do_truncate() (i.e., the inode mode check, the change attributes call, updating the inode size, etc.)? It seems like we would want some of that code here. fuse_setattr() is the only caller of fuse_do_setattr(), so why not embed some of the initial checks (such as fuse_allow_task()) there? I suppose we could pull out some of the error handling checks there as well if they are considered harmful to this post-write error truncate situation. FWIW, I just tested a quick change that pulls up the fuse_allow_task() check (via instrumenting a write error) and it seems to work as expected. I can forward a patch if interested... Brian > This non-POSIX behaviour (ftruncate(2) returning -1 with errno==EACCES) > was introduced long time ago: > >> commit e57ac68378a287d6336d187b26971f35f7ee7251 >> Author: Miklos Szeredi <msz...@su...> >> Date: Thu Oct 18 03:06:58 2007 -0700 >> >> fuse: fix allowing operations >> >> The following operation didn't check if sending the request was >> allowed: >> >> setattr >> listxattr >> statfs >> >> Some other operations don't explicitly do the check, but VFS calls >> ->permission() which checks this. >> >> Signed-off-by: Miklos Szeredi <msz...@su...> >> Signed-off-by: Andrew Morton <ak...@li...> >> Signed-off-by: Linus Torvalds <tor...@li...> > > and I'm not sure whether it was done intentionally or not. Maybe Miklos > could shed some light on it... > > Thanks, > Maxim |
From: Maxim V. P. <mpa...@pa...> - 2012-12-18 08:12:01
|
12/17/2012 11:04 PM, Brian Foster пишет: > On 12/17/2012 09:13 AM, Maxim V. Patlasov wrote: >> Hi, >> >> 12/15/2012 12:16 AM, Brian Foster пишет: >>> On 12/14/2012 10:21 AM, Maxim V. Patlasov wrote: > ... >>>> + >>> fuse_do_truncate() looks fairly close to fuse_do_setattr(). Is there any >>> reason we couldn't make fuse_do_setattr() non-static, change the dentry >>> parameter to an inode and use that? >> fuse_do_setattr() performs extra checks that fuse_do_truncate() needn't. >> Some of them are harmless, some not: fuse_allow_task() may return 0 if >> task credentials changed. E.g. super-user successfully opened a file, >> then setuid(other_user_uid), then write(2) to the file. write(2) doesn't >> check uid, but fuse_do_truncate() - via fuse_allow_task() - does. >> > Conversely, what about the extra error handling bits in > fuse_do_setattr() that do not appear in fuse_do_truncate() (i.e., the > inode mode check, the change attributes call, updating the inode size, > etc.)? It seems like we would want some of that code here. Yes, they won't harm. > > fuse_setattr() is the only caller of fuse_do_setattr(), so why not embed > some of the initial checks (such as fuse_allow_task()) there? I suppose > we could pull out some of the error handling checks there as well if > they are considered harmful to this post-write error truncate situation. Makes sense. I like it especially because it allows to avoid code duplication (handling FUSE_SETATTR fuse-request). > FWIW, I just tested a quick change that pulls up the fuse_allow_task() > check (via instrumenting a write error) and it seems to work as > expected. I can forward a patch if interested... I did exactly the same before sending previous email :) In my tests it works as expected too (modulo fuse_allow_task() that we can move up). I'll re-send corrected patch soon. Thanks, Maxim |
From: Brian F. <bf...@re...> - 2012-12-18 14:16:35
|
On 12/14/2012 10:20 AM, Maxim V. Patlasov wrote: > Hi, > ... > The throughput on some commodity (rather feeble) server was (in MB/sec): > > original / patched > > dd reads: ~322 / ~382 > dd writes: ~277 / ~288 > > aio reads: ~380 / ~459 > aio writes: ~319 / ~353 > > Changed in v2 - cleanups suggested by Brian: > - Updated fuse_io_priv with an async field and file pointer to preserve > the current style of interface (i.e., use this instead of iocb). > - Trigger the type of request submission based on the async field. > - Pulled up the fuse_write_update_size() call out of __fuse_direct_write() > to make the separate paths more consistent. > This version plus the updated "fuse: truncated file if async dio failed - v2" patch address all the questions I had on the set, so consider it: Reviewed-by: Brian Foster <bf...@re...> I also ran some of your aio/dio performance tests on a basic gluster volume (single client to server) and repeated positive results. The results include rewrite numbers (file extending writes generally matched original throughput). Results in MB/s: original / patched 1GigE dd reads: ~74 / ~104 dd rewrites: ~67 / ~103 aio reads: ~53 / ~110 aio rewrites: ~52 / ~112 10GigE dd reads: ~175 / ~437 dd rewrites: ~134 / ~390 aio reads: ~84 / ~417 aio rewrites: ~88 / ~401 Brian > Thanks, > Maxim > > --- > > Maxim V. Patlasov (6): > fuse: move fuse_release_user_pages() up > fuse: add support of async IO > fuse: make fuse_direct_io() aware about AIO > fuse: enable asynchronous processing direct IO > fuse: truncate file if async dio failed > fuse: optimize short direct reads > > > fs/fuse/cuse.c | 6 + > fs/fuse/file.c | 290 +++++++++++++++++++++++++++++++++++++++++++++++------- > fs/fuse/fuse_i.h | 19 +++- > 3 files changed, 276 insertions(+), 39 deletions(-) > |
From: Maxim V. P. <mpa...@pa...> - 2013-04-11 11:22:00
|
Hi Miklos, Any feedback would be highly appreciated. Thanks, Maxim 12/14/2012 07:20 PM, Maxim V. Patlasov пишет: > Hi, > > Existing fuse implementation always processes direct IO synchronously: it > submits next request to userspace fuse only when previous is completed. This > is suboptimal because: 1) libaio DIO works in blocking way; 2) userspace fuse > can't achieve parallelism processing several requests simultaneously (e.g. > in case of distributed network storage); 3) userspace fuse can't merge > requests before passing it to actual storage. > > The idea of the patch-set is to submit fuse requests in non-blocking way > (where it's possible) and either return -EIOCBQUEUED or wait for their > completion synchronously. The patch-set to be applied on top of for-next of > Miklos' git repo. > > To estimate performance improvement I used slightly modified fusexmp over > tmpfs (clearing O_DIRECT bit from fi->flags in xmp_open). For synchronous > operations I used 'dd' like this: > > dd of=/dev/null if=/fuse/mnt/file bs=2M count=256 iflag=direct > dd if=/dev/zero of=/fuse/mnt/file bs=2M count=256 oflag=direct conv=notrunc > > For AIO I used 'aio-stress' like this: > > aio-stress -s 512 -a 4 -b 1 -c 1 -O -o 1 /fuse/mnt/file > aio-stress -s 512 -a 4 -b 1 -c 1 -O -o 0 /fuse/mnt/file > > The throughput on some commodity (rather feeble) server was (in MB/sec): > > original / patched > > dd reads: ~322 / ~382 > dd writes: ~277 / ~288 > > aio reads: ~380 / ~459 > aio writes: ~319 / ~353 > > Changed in v2 - cleanups suggested by Brian: > - Updated fuse_io_priv with an async field and file pointer to preserve > the current style of interface (i.e., use this instead of iocb). > - Trigger the type of request submission based on the async field. > - Pulled up the fuse_write_update_size() call out of __fuse_direct_write() > to make the separate paths more consistent. > > Thanks, > Maxim > > --- > > Maxim V. Patlasov (6): > fuse: move fuse_release_user_pages() up > fuse: add support of async IO > fuse: make fuse_direct_io() aware about AIO > fuse: enable asynchronous processing direct IO > fuse: truncate file if async dio failed > fuse: optimize short direct reads > > > fs/fuse/cuse.c | 6 + > fs/fuse/file.c | 290 +++++++++++++++++++++++++++++++++++++++++++++++------- > fs/fuse/fuse_i.h | 19 +++- > 3 files changed, 276 insertions(+), 39 deletions(-) > |
From: Miklos S. <mi...@sz...> - 2013-04-11 16:08:13
|
Hi Maxim, On Thu, Apr 11, 2013 at 1:22 PM, Maxim V. Patlasov <mpa...@pa...> wrote: > Hi Miklos, > > Any feedback would be highly appreciated. What is the order of all these patchsets with regards to each other? Thanks, Miklos |
From: Maxim V. P. <mpa...@pa...> - 2013-04-11 16:43:05
|
Hi, 04/11/2013 08:07 PM, Miklos Szeredi пишет: > Hi Maxim, > > On Thu, Apr 11, 2013 at 1:22 PM, Maxim V. Patlasov > <mpa...@pa...> wrote: >> Hi Miklos, >> >> Any feedback would be highly appreciated. > What is the order of all these patchsets with regards to each other? They are logically independent, so I formed them to be applied w/o each other. There might be some minor collisions between them (if you try to apply one patch-set on the top of another). So, as soon as you get one of them to fuse-next, I'll update others to be applied smoothly. Either we can settle down some order now, and I'll do it in advance. Thanks, Maxim |
From: Miklos S. <mi...@sz...> - 2013-04-17 20:42:36
|
On Tue, Dec 18, 2012 at 9:12 AM, Maxim V. Patlasov <mpa...@pa...> wrote: > I did exactly the same before sending previous email :) In my tests it works > as expected too (modulo fuse_allow_task() that we can move up). I'll re-send > corrected patch soon. This patch was not yet re-sent. I applied the rest of them in this series (with small changes) and pushed to for-next. Thanks, Miklos |