From: Anand V. A. <av...@re...> - 2012-07-19 09:12:34
|
This patch implements readdirplus support in FUSE, similar to NFS. The payload returned in the readdirplus call contains 'fuse_entry_out' structure thereby providing all the necessary inputs for 'faking' a lookup() operation on the spot. If the dentry and inode already existed (for e.g. in a re-run of ls -l) then just the inode attributes timeout and dentry timeout are refreshed. With a simple client->network->server implementation of a FUSE based filesystem, the following performance observations were made: Test: Performing a filesystem crawl over 20,000 files with sh# time ls -lR /mnt Without readdirplus: Run 1: 18.1s Run 2: 16.0s Run 3: 16.2s With readdirplus: Run 1: 4.1s Run 2: 3.8s Run 3: 3.8s The performance improvement is significant as it avoided 20,000 upcalls calls (lookup). Cache consistency is no worse than what already is. RFC 1. Is it preferred to implement this support as an init flag or autodetect with ENOSYS and fail back to readdir? 2. When an inode link cannot be performed, we need to send a FORGET for the entry's nodeid. However to guarantee delivery of FORGET, we need to allocate fuse_forget_link in prior. However, for readdir like calls we do not know how many entries (and therefore how many fuse_forget_link structures) will be there. The code currently performs a best effort by trying to allocate on demand, however this might be useless because the need to allocate this is very likely because of failure to allocate some other memory. Is it possible somehow to allocate just one fuse_forget_link at the start of the call and queue+wait for every failed inode linkage? Signed-off-by: Anand V. Avati <av...@re...> --- fs/fuse/dir.c | 175 +++++++++++++++++++++++++++++++++++++++++++++++-- fs/fuse/fuse_i.h | 3 + fs/fuse/inode.c | 4 +- include/linux/fuse.h | 15 ++++- 4 files changed, 186 insertions(+), 11 deletions(-) diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c index 334e0b1..6fdb454 100644 --- a/fs/fuse/dir.c +++ b/fs/fuse/dir.c @@ -1096,10 +1096,14 @@ static int fuse_permission(struct inode *inode, int mask) static int parse_dirfile(char *buf, size_t nbytes, struct file *file, void *dstbuf, filldir_t filldir) { - while (nbytes >= FUSE_NAME_OFFSET) { - struct fuse_dirent *dirent = (struct fuse_dirent *) buf; - size_t reclen = FUSE_DIRENT_SIZE(dirent); - int over; + struct fuse_dirent *dirent; + size_t reclen; + int over; + + while (nbytes >= FUSE_NAME_OFFSET(dirent)) { + dirent = (struct fuse_dirent *) buf; + reclen = FUSE_DIRENT_SIZE(dirent); + if (!dirent->namelen || dirent->namelen > FUSE_NAME_MAX) return -EIO; if (reclen > nbytes) @@ -1118,6 +1122,151 @@ static int parse_dirfile(char *buf, size_t nbytes, struct file *file, return 0; } +void fuse_alloc_and_forget(struct fuse_conn *fc, u64 nodeid) +{ + struct fuse_forget_link *forget; + + forget = fuse_alloc_forget(); + if (!forget) + return; + fuse_queue_forget(fc, forget, nodeid, 1); +} + +static int fuse_direntp_link(struct dentry *parent, + struct fuse_direntp *direntp) +{ + int ret = -1; + struct qstr name = QSTR_INIT(direntp->name, direntp->namelen); + struct dentry *dentry; + struct dentry *alias; + struct inode *dir = parent->d_inode; + struct fuse_conn *fc; + struct fuse_entry_out *feo = &direntp->feo; + struct inode *inode; + struct fuse_inode *fi; + u64 attr_version; + + if (!direntp->feo.nodeid) { + /* Unlike in the case of fuse_lookup, zero nodeid does + not mean ENOENT. Instead, it only means the userspace + filesystem did not want to return attributes/handle for + this entry. + + So do nothing. + */ + return 0; + } + + if (name.name[0] == '.') { + /* We could potentially refresh the attributes of the + directory and its parent? + */ + if (name.len == 1) + return 0; + if (name.name[1] == '.' && name.len == 2) + return 0; + } + name.hash = full_name_hash(name.name, name.len); + + dentry = d_lookup(parent, &name); + if (dentry && dentry->d_inode) { + inode = dentry->d_inode; + if (get_node_id(inode) == direntp->feo.nodeid) { + goto found; + } else { + fuse_alloc_and_forget(fc, get_node_id(inode)); + d_drop(dentry); + dput(dentry); + dentry = NULL; + } + } + + dentry = d_alloc(parent, &name); + if (!dentry) + goto out; + + inode = fuse_iget(dir->i_sb, feo->nodeid, feo->generation, + &feo->attr, entry_attr_timeout(feo), + attr_version); + if (!inode || IS_ERR(inode)) + goto out; + + alias = d_materialise_unique(dentry, inode); + if (IS_ERR(alias)) + goto out; + +found: + fi = get_fuse_inode(inode); + + fc = get_fuse_conn(dir); + + attr_version = fuse_get_attr_version(fc); + + fuse_change_attributes(inode, &feo->attr, + entry_attr_timeout(feo), + attr_version); + if (alias) + fuse_change_entry_timeout(alias, feo); + else + fuse_change_entry_timeout(dentry, feo); + + spin_lock(&fc->lock); + fi->nlookup++; + spin_unlock(&fc->lock); + + ret = 0; +out: + if (dentry) + dput(dentry); + if (alias) + dput(alias); + return ret; +} + +static int parse_dirplusfile(char *buf, size_t nbytes, struct file *file, + void *dstbuf, filldir_t filldir) +{ + struct fuse_direntp *direntp; + size_t reclen; + int over = 0; + int ret; + + while (nbytes >= FUSE_NAME_OFFSET(direntp)) { + direntp = (struct fuse_direntp *) buf; + reclen = FUSE_DIRENT_SIZE(direntp); + + if (!direntp->namelen || direntp->namelen > FUSE_NAME_MAX) + return -EIO; + if (reclen > nbytes) + break; + + if (!over) { + /* We fill entries into dstbuf only as much as + it can hold. But we still continue iterating + over remaining entries to link them. If not, + we need to send a FORGET for each of those + which we did not link. + */ + over = filldir(dstbuf, direntp->name, direntp->namelen, + file->f_pos, direntp->ino, + direntp->type); + file->f_pos = direntp->off; + } + + buf += reclen; + nbytes -= reclen; + + ret = fuse_direntp_link(file->f_path.dentry, direntp); + if (ret) { + struct fuse_conn *fc; + fc = get_fuse_conn(file->f_path.dentry->d_inode); + fuse_alloc_and_forget(fc, direntp->feo.nodeid); + } + } + + return 0; +} + static int fuse_readdir(struct file *file, void *dstbuf, filldir_t filldir) { int err; @@ -1142,14 +1291,24 @@ static int fuse_readdir(struct file *file, void *dstbuf, filldir_t filldir) req->out.argpages = 1; req->num_pages = 1; req->pages[0] = page; - fuse_read_fill(req, file, file->f_pos, PAGE_SIZE, FUSE_READDIR); + if (fc->do_readdirplus) + fuse_read_fill(req, file, file->f_pos, PAGE_SIZE, + FUSE_READDIRPLUS); + else + fuse_read_fill(req, file, file->f_pos, PAGE_SIZE, + FUSE_READDIR); fuse_request_send(fc, req); nbytes = req->out.args[0].size; err = req->out.h.error; fuse_put_request(fc, req); - if (!err) - err = parse_dirfile(page_address(page), nbytes, file, dstbuf, - filldir); + if (!err) { + if (fc->do_readdirplus) + err = parse_dirplusfile(page_address(page), nbytes, + file, dstbuf, filldir); + else + err = parse_dirfile(page_address(page), nbytes, file, + dstbuf, filldir); + } __free_page(page); fuse_invalidate_attr(inode); /* atime changed */ diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index 771fb63..2ed4259 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -484,6 +484,9 @@ struct fuse_conn { /** Is fallocate not implemented by fs? */ unsigned no_fallocate:1; + /** Does the filesystem support readdir-plus? */ + unsigned do_readdirplus:1; + /** The number of requests waiting for completion */ atomic_t num_waiting; diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c index 1cd6165..5a10c8c 100644 --- a/fs/fuse/inode.c +++ b/fs/fuse/inode.c @@ -834,6 +834,8 @@ static void process_init_reply(struct fuse_conn *fc, struct fuse_req *req) fc->big_writes = 1; if (arg->flags & FUSE_DONT_MASK) fc->dont_mask = 1; + if (arg->flags & FUSE_DO_READDIRPLUS) + fc->do_readdirplus = 1; } else { ra_pages = fc->max_read / PAGE_CACHE_SIZE; fc->no_lock = 1; @@ -859,7 +861,7 @@ static void fuse_send_init(struct fuse_conn *fc, struct fuse_req *req) arg->max_readahead = fc->bdi.ra_pages * PAGE_CACHE_SIZE; arg->flags |= FUSE_ASYNC_READ | FUSE_POSIX_LOCKS | FUSE_ATOMIC_O_TRUNC | FUSE_EXPORT_SUPPORT | FUSE_BIG_WRITES | FUSE_DONT_MASK | - FUSE_FLOCK_LOCKS; + FUSE_FLOCK_LOCKS | FUSE_DO_READDIRPLUS; req->in.h.opcode = FUSE_INIT; req->in.numargs = 1; req->in.args[0].size = sizeof(*arg); diff --git a/include/linux/fuse.h b/include/linux/fuse.h index 9303348..3be2f39 100644 --- a/include/linux/fuse.h +++ b/include/linux/fuse.h @@ -175,6 +175,7 @@ struct fuse_file_lock { #define FUSE_EXPORT_SUPPORT (1 << 4) #define FUSE_BIG_WRITES (1 << 5) #define FUSE_DONT_MASK (1 << 6) +#define FUSE_DO_READDIRPLUS (1 << 7) #define FUSE_FLOCK_LOCKS (1 << 10) /** @@ -282,6 +283,7 @@ enum fuse_opcode { FUSE_NOTIFY_REPLY = 41, FUSE_BATCH_FORGET = 42, FUSE_FALLOCATE = 43, + FUSE_READDIRPLUS = 44, /* CUSE specific operations */ CUSE_INIT = 4096, @@ -608,10 +610,19 @@ struct fuse_dirent { char name[]; }; -#define FUSE_NAME_OFFSET offsetof(struct fuse_dirent, name) +struct fuse_direntp { + __u64 ino; + __u64 off; + __u32 namelen; + __u32 type; + struct fuse_entry_out feo; + char name[]; +}; + +#define FUSE_NAME_OFFSET(d) offsetof(typeof(*d), name) #define FUSE_DIRENT_ALIGN(x) (((x) + sizeof(__u64) - 1) & ~(sizeof(__u64) - 1)) #define FUSE_DIRENT_SIZE(d) \ - FUSE_DIRENT_ALIGN(FUSE_NAME_OFFSET + (d)->namelen) + FUSE_DIRENT_ALIGN(FUSE_NAME_OFFSET(d) + (d)->namelen) struct fuse_notify_inval_inode_out { __u64 ino; -- 1.7.4.4 |
From: Anand V. A. <av...@re...> - 2012-07-19 09:42:36
|
This patch implements readdirplus support in FUSE, similar to NFS. The payload returned in the readdirplus call contains 'fuse_entry_out' structure thereby providing all the necessary inputs for 'faking' a lookup() operation on the spot. If the dentry and inode already existed (for e.g. in a re-run of ls -l) then just the inode attributes timeout and dentry timeout are refreshed. With a simple client->network->server implementation of a FUSE based filesystem, the following performance observations were made: Test: Performing a filesystem crawl over 20,000 files with sh# time ls -lR /mnt Without readdirplus: Run 1: 18.1s Run 2: 16.0s Run 3: 16.2s With readdirplus: Run 1: 4.1s Run 2: 3.8s Run 3: 3.8s The performance improvement is significant as it avoided 20,000 upcalls calls (lookup). Cache consistency is no worse than what already is. RFC 1. Is it preferred to implement this support as an init flag or autodetect with ENOSYS and fail back to readdir? 2. When an inode link cannot be performed, we need to send a FORGET for the entry's nodeid. However to guarantee delivery of FORGET, we need to allocate fuse_forget_link in prior. However, for readdir like calls we do not know how many entries (and therefore how many fuse_forget_link structures) will be there. The code currently performs a best effort by trying to allocate on demand, however this might be useless because the need to allocate this is very likely because of failure to allocate some other memory. Is it possible somehow to allocate just one fuse_forget_link at the start of the call and queue+wait for every failed inode linkage? Signed-off-by: Anand V. Avati <av...@re...> --- fs/fuse/dir.c | 175 +++++++++++++++++++++++++++++++++++++++++++++++-- fs/fuse/fuse_i.h | 3 + fs/fuse/inode.c | 4 +- include/linux/fuse.h | 15 ++++- 4 files changed, 186 insertions(+), 11 deletions(-) diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c index 334e0b1..6fdb454 100644 --- a/fs/fuse/dir.c +++ b/fs/fuse/dir.c @@ -1096,10 +1096,14 @@ static int fuse_permission(struct inode *inode, int mask) static int parse_dirfile(char *buf, size_t nbytes, struct file *file, void *dstbuf, filldir_t filldir) { - while (nbytes >= FUSE_NAME_OFFSET) { - struct fuse_dirent *dirent = (struct fuse_dirent *) buf; - size_t reclen = FUSE_DIRENT_SIZE(dirent); - int over; + struct fuse_dirent *dirent; + size_t reclen; + int over; + + while (nbytes >= FUSE_NAME_OFFSET(dirent)) { + dirent = (struct fuse_dirent *) buf; + reclen = FUSE_DIRENT_SIZE(dirent); + if (!dirent->namelen || dirent->namelen > FUSE_NAME_MAX) return -EIO; if (reclen > nbytes) @@ -1118,6 +1122,151 @@ static int parse_dirfile(char *buf, size_t nbytes, struct file *file, return 0; } +void fuse_alloc_and_forget(struct fuse_conn *fc, u64 nodeid) +{ + struct fuse_forget_link *forget; + + forget = fuse_alloc_forget(); + if (!forget) + return; + fuse_queue_forget(fc, forget, nodeid, 1); +} + +static int fuse_direntp_link(struct dentry *parent, + struct fuse_direntp *direntp) +{ + int ret = -1; + struct qstr name = QSTR_INIT(direntp->name, direntp->namelen); + struct dentry *dentry; + struct dentry *alias; + struct inode *dir = parent->d_inode; + struct fuse_conn *fc; + struct fuse_entry_out *feo = &direntp->feo; + struct inode *inode; + struct fuse_inode *fi; + u64 attr_version; + + if (!direntp->feo.nodeid) { + /* Unlike in the case of fuse_lookup, zero nodeid does + not mean ENOENT. Instead, it only means the userspace + filesystem did not want to return attributes/handle for + this entry. + + So do nothing. + */ + return 0; + } + + if (name.name[0] == '.') { + /* We could potentially refresh the attributes of the + directory and its parent? + */ + if (name.len == 1) + return 0; + if (name.name[1] == '.' && name.len == 2) + return 0; + } + name.hash = full_name_hash(name.name, name.len); + + dentry = d_lookup(parent, &name); + if (dentry && dentry->d_inode) { + inode = dentry->d_inode; + if (get_node_id(inode) == direntp->feo.nodeid) { + goto found; + } else { + fuse_alloc_and_forget(fc, get_node_id(inode)); + d_drop(dentry); + dput(dentry); + dentry = NULL; + } + } + + dentry = d_alloc(parent, &name); + if (!dentry) + goto out; + + inode = fuse_iget(dir->i_sb, feo->nodeid, feo->generation, + &feo->attr, entry_attr_timeout(feo), + attr_version); + if (!inode || IS_ERR(inode)) + goto out; + + alias = d_materialise_unique(dentry, inode); + if (IS_ERR(alias)) + goto out; + +found: + fi = get_fuse_inode(inode); + + fc = get_fuse_conn(dir); + + attr_version = fuse_get_attr_version(fc); + + fuse_change_attributes(inode, &feo->attr, + entry_attr_timeout(feo), + attr_version); + if (alias) + fuse_change_entry_timeout(alias, feo); + else + fuse_change_entry_timeout(dentry, feo); + + spin_lock(&fc->lock); + fi->nlookup++; + spin_unlock(&fc->lock); + + ret = 0; +out: + if (dentry) + dput(dentry); + if (alias) + dput(alias); + return ret; +} + +static int parse_dirplusfile(char *buf, size_t nbytes, struct file *file, + void *dstbuf, filldir_t filldir) +{ + struct fuse_direntp *direntp; + size_t reclen; + int over = 0; + int ret; + + while (nbytes >= FUSE_NAME_OFFSET(direntp)) { + direntp = (struct fuse_direntp *) buf; + reclen = FUSE_DIRENT_SIZE(direntp); + + if (!direntp->namelen || direntp->namelen > FUSE_NAME_MAX) + return -EIO; + if (reclen > nbytes) + break; + + if (!over) { + /* We fill entries into dstbuf only as much as + it can hold. But we still continue iterating + over remaining entries to link them. If not, + we need to send a FORGET for each of those + which we did not link. + */ + over = filldir(dstbuf, direntp->name, direntp->namelen, + file->f_pos, direntp->ino, + direntp->type); + file->f_pos = direntp->off; + } + + buf += reclen; + nbytes -= reclen; + + ret = fuse_direntp_link(file->f_path.dentry, direntp); + if (ret) { + struct fuse_conn *fc; + fc = get_fuse_conn(file->f_path.dentry->d_inode); + fuse_alloc_and_forget(fc, direntp->feo.nodeid); + } + } + + return 0; +} + static int fuse_readdir(struct file *file, void *dstbuf, filldir_t filldir) { int err; @@ -1142,14 +1291,24 @@ static int fuse_readdir(struct file *file, void *dstbuf, filldir_t filldir) req->out.argpages = 1; req->num_pages = 1; req->pages[0] = page; - fuse_read_fill(req, file, file->f_pos, PAGE_SIZE, FUSE_READDIR); + if (fc->do_readdirplus) + fuse_read_fill(req, file, file->f_pos, PAGE_SIZE, + FUSE_READDIRPLUS); + else + fuse_read_fill(req, file, file->f_pos, PAGE_SIZE, + FUSE_READDIR); fuse_request_send(fc, req); nbytes = req->out.args[0].size; err = req->out.h.error; fuse_put_request(fc, req); - if (!err) - err = parse_dirfile(page_address(page), nbytes, file, dstbuf, - filldir); + if (!err) { + if (fc->do_readdirplus) + err = parse_dirplusfile(page_address(page), nbytes, + file, dstbuf, filldir); + else + err = parse_dirfile(page_address(page), nbytes, file, + dstbuf, filldir); + } __free_page(page); fuse_invalidate_attr(inode); /* atime changed */ diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index 771fb63..2ed4259 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -484,6 +484,9 @@ struct fuse_conn { /** Is fallocate not implemented by fs? */ unsigned no_fallocate:1; + /** Does the filesystem support readdir-plus? */ + unsigned do_readdirplus:1; + /** The number of requests waiting for completion */ atomic_t num_waiting; diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c index 1cd6165..5a10c8c 100644 --- a/fs/fuse/inode.c +++ b/fs/fuse/inode.c @@ -834,6 +834,8 @@ static void process_init_reply(struct fuse_conn *fc, struct fuse_req *req) fc->big_writes = 1; if (arg->flags & FUSE_DONT_MASK) fc->dont_mask = 1; + if (arg->flags & FUSE_DO_READDIRPLUS) + fc->do_readdirplus = 1; } else { ra_pages = fc->max_read / PAGE_CACHE_SIZE; fc->no_lock = 1; @@ -859,7 +861,7 @@ static void fuse_send_init(struct fuse_conn *fc, struct fuse_req *req) arg->max_readahead = fc->bdi.ra_pages * PAGE_CACHE_SIZE; arg->flags |= FUSE_ASYNC_READ | FUSE_POSIX_LOCKS | FUSE_ATOMIC_O_TRUNC | FUSE_EXPORT_SUPPORT | FUSE_BIG_WRITES | FUSE_DONT_MASK | - FUSE_FLOCK_LOCKS; + FUSE_FLOCK_LOCKS | FUSE_DO_READDIRPLUS; req->in.h.opcode = FUSE_INIT; req->in.numargs = 1; req->in.args[0].size = sizeof(*arg); diff --git a/include/linux/fuse.h b/include/linux/fuse.h index 9303348..3be2f39 100644 --- a/include/linux/fuse.h +++ b/include/linux/fuse.h @@ -175,6 +175,7 @@ struct fuse_file_lock { #define FUSE_EXPORT_SUPPORT (1 << 4) #define FUSE_BIG_WRITES (1 << 5) #define FUSE_DONT_MASK (1 << 6) +#define FUSE_DO_READDIRPLUS (1 << 7) #define FUSE_FLOCK_LOCKS (1 << 10) /** @@ -282,6 +283,7 @@ enum fuse_opcode { FUSE_NOTIFY_REPLY = 41, FUSE_BATCH_FORGET = 42, FUSE_FALLOCATE = 43, + FUSE_READDIRPLUS = 44, /* CUSE specific operations */ CUSE_INIT = 4096, @@ -608,10 +610,19 @@ struct fuse_dirent { char name[]; }; -#define FUSE_NAME_OFFSET offsetof(struct fuse_dirent, name) +struct fuse_direntp { + __u64 ino; + __u64 off; + __u32 namelen; + __u32 type; + struct fuse_entry_out feo; + char name[]; +}; + +#define FUSE_NAME_OFFSET(d) offsetof(typeof(*d), name) #define FUSE_DIRENT_ALIGN(x) (((x) + sizeof(__u64) - 1) & ~(sizeof(__u64) - 1)) #define FUSE_DIRENT_SIZE(d) \ - FUSE_DIRENT_ALIGN(FUSE_NAME_OFFSET + (d)->namelen) + FUSE_DIRENT_ALIGN(FUSE_NAME_OFFSET(d) + (d)->namelen) struct fuse_notify_inval_inode_out { __u64 ino; -- 1.7.4.4 |
From: Miklos S. <mi...@sz...> - 2012-07-31 16:34:10
|
"Anand V. Avati" <av...@re...> writes: > This patch implements readdirplus support in FUSE, similar to NFS. > The payload returned in the readdirplus call contains > 'fuse_entry_out' structure thereby providing all the necessary inputs > for 'faking' a lookup() operation on the spot. > > If the dentry and inode already existed (for e.g. in a re-run of ls -l) > then just the inode attributes timeout and dentry timeout are refreshed. > > With a simple client->network->server implementation of a FUSE based > filesystem, the following performance observations were made: > > Test: Performing a filesystem crawl over 20,000 files with > > sh# time ls -lR /mnt > > Without readdirplus: > Run 1: 18.1s > Run 2: 16.0s > Run 3: 16.2s > > With readdirplus: > Run 1: 4.1s > Run 2: 3.8s > Run 3: 3.8s Cool. > > The performance improvement is significant as it avoided 20,000 upcalls > calls (lookup). Cache consistency is no worse than what already is. > > RFC > 1. Is it preferred to implement this support as an init flag or > autodetect with ENOSYS and fail back to readdir? Using an init flag is fine. It's non trivial to do the fallback and really not necessary since the filesystem can tell up front that it wants the readdirplus interface. > 2. When an inode link cannot be performed, we need to send a FORGET for > the entry's nodeid. However to guarantee delivery of FORGET, we need to > allocate fuse_forget_link in prior. However, for readdir like calls we > do not know how many entries (and therefore how many fuse_forget_link > structures) will be there. The code currently performs a best effort > by trying to allocate on demand, however this might be useless because > the need to allocate this is very likely because of failure to allocate > some other memory. Is it possible somehow to allocate just one > fuse_forget_link at the start of the call and queue+wait for every failed > inode linkage? You can use fuse_get_req_nofail() here and send the forget request by hand. See more comments inline. > Signed-off-by: Anand V. Avati <av...@re...> > --- > fs/fuse/dir.c | 175 +++++++++++++++++++++++++++++++++++++++++++++++-- > fs/fuse/fuse_i.h | 3 + > fs/fuse/inode.c | 4 +- > include/linux/fuse.h | 15 ++++- > 4 files changed, 186 insertions(+), 11 deletions(-) > > diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c > index 334e0b1..6fdb454 100644 > --- a/fs/fuse/dir.c > +++ b/fs/fuse/dir.c > @@ -1096,10 +1096,14 @@ static int fuse_permission(struct inode *inode, int mask) > static int parse_dirfile(char *buf, size_t nbytes, struct file *file, > void *dstbuf, filldir_t filldir) > { > - while (nbytes >= FUSE_NAME_OFFSET) { > - struct fuse_dirent *dirent = (struct fuse_dirent *) buf; > - size_t reclen = FUSE_DIRENT_SIZE(dirent); > - int over; > + struct fuse_dirent *dirent; > + size_t reclen; > + int over; > + > + while (nbytes >= FUSE_NAME_OFFSET(dirent)) { > + dirent = (struct fuse_dirent *) buf; > + reclen = FUSE_DIRENT_SIZE(dirent); > + What's this hunk about? It didn't change anything AFAICS. > if (!dirent->namelen || dirent->namelen > FUSE_NAME_MAX) > return -EIO; > if (reclen > nbytes) > @@ -1118,6 +1122,151 @@ static int parse_dirfile(char *buf, size_t nbytes, struct file *file, > return 0; > } > > +void fuse_alloc_and_forget(struct fuse_conn *fc, u64 nodeid) > +{ > + struct fuse_forget_link *forget; > + > + forget = fuse_alloc_forget(); > + if (!forget) > + return; > + fuse_queue_forget(fc, forget, nodeid, 1); > +} > + > +static int fuse_direntp_link(struct dentry *parent, > + struct fuse_direntp *direntp) > +{ > + int ret = -1; > + struct qstr name = QSTR_INIT(direntp->name, direntp->namelen); > + struct dentry *dentry; > + struct dentry *alias; > + struct inode *dir = parent->d_inode; > + struct fuse_conn *fc; > + struct fuse_entry_out *feo = &direntp->feo; > + struct inode *inode; > + struct fuse_inode *fi; > + u64 attr_version; > + > + if (!direntp->feo.nodeid) { > + /* Unlike in the case of fuse_lookup, zero nodeid does > + not mean ENOENT. Instead, it only means the userspace > + filesystem did not want to return attributes/handle for > + this entry. > + > + So do nothing. > + */ > + return 0; > + } > + > + if (name.name[0] == '.') { > + /* We could potentially refresh the attributes of the > + directory and its parent? > + */ > + if (name.len == 1) > + return 0; > + if (name.name[1] == '.' && name.len == 2) > + return 0; > + } > + name.hash = full_name_hash(name.name, name.len); > + > + dentry = d_lookup(parent, &name); > + if (dentry && dentry->d_inode) { > + inode = dentry->d_inode; > + if (get_node_id(inode) == direntp->feo.nodeid) { > + goto found; > + } else { > + fuse_alloc_and_forget(fc, get_node_id(inode)); > + d_drop(dentry); I see what you are trying to do here, but doing d_drop() like that is not a good idea. What you want here is more like the revalidate/ invalidate logic on lookup. > + dput(dentry); > + dentry = NULL; > + } > + } > + > + dentry = d_alloc(parent, &name); > + if (!dentry) > + goto out; > + > + inode = fuse_iget(dir->i_sb, feo->nodeid, feo->generation, > + &feo->attr, entry_attr_timeout(feo), > + attr_version); > + if (!inode || IS_ERR(inode)) > + goto out; > + > + alias = d_materialise_unique(dentry, inode); > + if (IS_ERR(alias)) > + goto out; Why not set "dentry" to "alias" if non-NULL? Then later you won't need the "if (alias) ..." conditions. > + > +found: > + fi = get_fuse_inode(inode); > + > + fc = get_fuse_conn(dir); > + > + attr_version = fuse_get_attr_version(fc); > + > + fuse_change_attributes(inode, &feo->attr, > + entry_attr_timeout(feo), > + attr_version); > + if (alias) > + fuse_change_entry_timeout(alias, feo); > + else > + fuse_change_entry_timeout(dentry, feo); > + > + spin_lock(&fc->lock); > + fi->nlookup++; > + spin_unlock(&fc->lock); > + > + ret = 0; > +out: > + if (dentry) > + dput(dentry); > + if (alias) > + dput(alias); > + return ret; > +} > + > +static int parse_dirplusfile(char *buf, size_t nbytes, struct file *file, > + void *dstbuf, filldir_t filldir) > +{ > + struct fuse_direntp *direntp; > + size_t reclen; > + int over = 0; > + int ret; > + > + while (nbytes >= FUSE_NAME_OFFSET(direntp)) { > + direntp = (struct fuse_direntp *) buf; > + reclen = FUSE_DIRENT_SIZE(direntp); > + > + if (!direntp->namelen || direntp->namelen > FUSE_NAME_MAX) > + return -EIO; > + if (reclen > nbytes) > + break; > + > + if (!over) { > + /* We fill entries into dstbuf only as much as > + it can hold. But we still continue iterating > + over remaining entries to link them. If not, > + we need to send a FORGET for each of those > + which we did not link. > + */ > + over = filldir(dstbuf, direntp->name, direntp->namelen, > + file->f_pos, direntp->ino, > + direntp->type); > + file->f_pos = direntp->off; > + } > + > + buf += reclen; > + nbytes -= reclen; > + > + ret = fuse_direntp_link(file->f_path.dentry, direntp); > + if (ret) { > + struct fuse_conn *fc; > + fc = get_fuse_conn(file->f_path.dentry->d_inode); > + fuse_alloc_and_forget(fc, direntp->feo.nodeid); > + } > + } > + > + return 0; > +} > + > static int fuse_readdir(struct file *file, void *dstbuf, filldir_t filldir) > { > int err; > @@ -1142,14 +1291,24 @@ static int fuse_readdir(struct file *file, void *dstbuf, filldir_t filldir) > req->out.argpages = 1; > req->num_pages = 1; > req->pages[0] = page; > - fuse_read_fill(req, file, file->f_pos, PAGE_SIZE, FUSE_READDIR); > + if (fc->do_readdirplus) > + fuse_read_fill(req, file, file->f_pos, PAGE_SIZE, > + FUSE_READDIRPLUS); > + else > + fuse_read_fill(req, file, file->f_pos, PAGE_SIZE, > + FUSE_READDIR); > fuse_request_send(fc, req); > nbytes = req->out.args[0].size; > err = req->out.h.error; > fuse_put_request(fc, req); > - if (!err) > - err = parse_dirfile(page_address(page), nbytes, file, dstbuf, > - filldir); > + if (!err) { > + if (fc->do_readdirplus) > + err = parse_dirplusfile(page_address(page), nbytes, > + file, dstbuf, filldir); > + else > + err = parse_dirfile(page_address(page), nbytes, file, > + dstbuf, filldir); > + } > > __free_page(page); > fuse_invalidate_attr(inode); /* atime changed */ > diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h > index 771fb63..2ed4259 100644 > --- a/fs/fuse/fuse_i.h > +++ b/fs/fuse/fuse_i.h > @@ -484,6 +484,9 @@ struct fuse_conn { > /** Is fallocate not implemented by fs? */ > unsigned no_fallocate:1; > > + /** Does the filesystem support readdir-plus? */ > + unsigned do_readdirplus:1; > + > /** The number of requests waiting for completion */ > atomic_t num_waiting; > > diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c > index 1cd6165..5a10c8c 100644 > --- a/fs/fuse/inode.c > +++ b/fs/fuse/inode.c > @@ -834,6 +834,8 @@ static void process_init_reply(struct fuse_conn *fc, struct fuse_req *req) > fc->big_writes = 1; > if (arg->flags & FUSE_DONT_MASK) > fc->dont_mask = 1; > + if (arg->flags & FUSE_DO_READDIRPLUS) > + fc->do_readdirplus = 1; > } else { > ra_pages = fc->max_read / PAGE_CACHE_SIZE; > fc->no_lock = 1; > @@ -859,7 +861,7 @@ static void fuse_send_init(struct fuse_conn *fc, struct fuse_req *req) > arg->max_readahead = fc->bdi.ra_pages * PAGE_CACHE_SIZE; > arg->flags |= FUSE_ASYNC_READ | FUSE_POSIX_LOCKS | FUSE_ATOMIC_O_TRUNC | > FUSE_EXPORT_SUPPORT | FUSE_BIG_WRITES | FUSE_DONT_MASK | > - FUSE_FLOCK_LOCKS; > + FUSE_FLOCK_LOCKS | FUSE_DO_READDIRPLUS; > req->in.h.opcode = FUSE_INIT; > req->in.numargs = 1; > req->in.args[0].size = sizeof(*arg); > diff --git a/include/linux/fuse.h b/include/linux/fuse.h > index 9303348..3be2f39 100644 > --- a/include/linux/fuse.h > +++ b/include/linux/fuse.h > @@ -175,6 +175,7 @@ struct fuse_file_lock { > #define FUSE_EXPORT_SUPPORT (1 << 4) > #define FUSE_BIG_WRITES (1 << 5) > #define FUSE_DONT_MASK (1 << 6) > +#define FUSE_DO_READDIRPLUS (1 << 7) > #define FUSE_FLOCK_LOCKS (1 << 10) Please rebase against git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse.git for-next > > /** > @@ -282,6 +283,7 @@ enum fuse_opcode { > FUSE_NOTIFY_REPLY = 41, > FUSE_BATCH_FORGET = 42, > FUSE_FALLOCATE = 43, > + FUSE_READDIRPLUS = 44, > > /* CUSE specific operations */ > CUSE_INIT = 4096, > @@ -608,10 +610,19 @@ struct fuse_dirent { > char name[]; > }; > > -#define FUSE_NAME_OFFSET offsetof(struct fuse_dirent, name) > +struct fuse_direntp { > + __u64 ino; > + __u64 off; > + __u32 namelen; > + __u32 type; > + struct fuse_entry_out feo; > + char name[]; > +}; > + > +#define FUSE_NAME_OFFSET(d) offsetof(typeof(*d), name) FUSE_NAME_OFFSET (without an argument) is used by userspace code. So please don't change it. Create a new define instead. > #define FUSE_DIRENT_ALIGN(x) (((x) + sizeof(__u64) - 1) & ~(sizeof(__u64) - 1)) > #define FUSE_DIRENT_SIZE(d) \ > - FUSE_DIRENT_ALIGN(FUSE_NAME_OFFSET + (d)->namelen) > + FUSE_DIRENT_ALIGN(FUSE_NAME_OFFSET(d) + (d)->namelen) > > struct fuse_notify_inval_inode_out { > __u64 ino; Thanks, Miklos |
From: Anand A. <av...@re...> - 2012-08-07 10:54:47
|
On 07/31/2012 09:35 AM, Miklos Szeredi wrote: >> >> The performance improvement is significant as it avoided 20,000 upcalls >> calls (lookup). Cache consistency is no worse than what already is. >> >> RFC >> 1. Is it preferred to implement this support as an init flag or >> autodetect with ENOSYS and fail back to readdir? > > Using an init flag is fine. It's non trivial to do the fallback and > really not necessary since the filesystem can tell up front that it > wants the readdirplus interface. OK. I was looking at fuse_access_open(), fuse_setxattr() etc. and thought it would not be too difficult if an init flag bit-space is considered expensive. > >> 2. When an inode link cannot be performed, we need to send a FORGET for >> the entry's nodeid. However to guarantee delivery of FORGET, we need to >> allocate fuse_forget_link in prior. However, for readdir like calls we >> do not know how many entries (and therefore how many fuse_forget_link >> structures) will be there. The code currently performs a best effort >> by trying to allocate on demand, however this might be useless because >> the need to allocate this is very likely because of failure to allocate >> some other memory. Is it possible somehow to allocate just one >> fuse_forget_link at the start of the call and queue+wait for every failed >> inode linkage? > > You can use fuse_get_req_nofail() here and send the forget request by > hand. Thanks for the pointer. >> @@ -1096,10 +1096,14 @@ static int fuse_permission(struct inode *inode, int mask) >> static int parse_dirfile(char *buf, size_t nbytes, struct file *file, >> void *dstbuf, filldir_t filldir) >> { >> - while (nbytes >= FUSE_NAME_OFFSET) { >> - struct fuse_dirent *dirent = (struct fuse_dirent *) buf; >> - size_t reclen = FUSE_DIRENT_SIZE(dirent); >> - int over; >> + struct fuse_dirent *dirent; >> + size_t reclen; >> + int over; >> + >> + while (nbytes >= FUSE_NAME_OFFSET(dirent)) { >> + dirent = (struct fuse_dirent *) buf; >> + reclen = FUSE_DIRENT_SIZE(dirent); >> + > > What's this hunk about? It didn't change anything AFAICS. This was necessary as dirent had to be a macro parameter in the while() condition (hence had to be declared at the top of the function). Based on your other comment (of not modifying FUSE_NAME_OFFSET) this change will become unnecessary. >> + dentry = d_lookup(parent, &name); >> + if (dentry && dentry->d_inode) { >> + inode = dentry->d_inode; >> + if (get_node_id(inode) == direntp->feo.nodeid) { >> + goto found; >> + } else { >> + fuse_alloc_and_forget(fc, get_node_id(inode)); >> + d_drop(dentry); > > I see what you are trying to do here, but doing d_drop() like that is > not a good idea. What you want here is more like the revalidate/ > invalidate logic on lookup. OK. Do you mean we just do a d_invalidate() or fuse_invalidate_entry() of the old dentry (instead of d_drop)? After looking at how dcache.c is handling revalidates, it looked that that would be the only crux of the change. It also appears that NFS seems to do a d_drop() in similar situation. Why is FUSE different for this? > > > >> + dput(dentry); >> + dentry = NULL; >> + } >> + } >> + >> + dentry = d_alloc(parent, &name); >> + if (!dentry) >> + goto out; >> + >> + inode = fuse_iget(dir->i_sb, feo->nodeid, feo->generation, >> + &feo->attr, entry_attr_timeout(feo), >> + attr_version); >> + if (!inode || IS_ERR(inode)) >> + goto out; >> + >> + alias = d_materialise_unique(dentry, inode); >> + if (IS_ERR(alias)) >> + goto out; > > > Why not set "dentry" to "alias" if non-NULL? Then later you won't need > the "if (alias) ..." conditions. OK. > >> + >> +found: >> + fi = get_fuse_inode(inode); >> + >> + fc = get_fuse_conn(dir); >> + >> + attr_version = fuse_get_attr_version(fc); >> + >> + fuse_change_attributes(inode, &feo->attr, >> + entry_attr_timeout(feo), >> + attr_version); >> + if (alias) >> + fuse_change_entry_timeout(alias, feo); >> + else >> + fuse_change_entry_timeout(dentry, feo); >> + >> + spin_lock(&fc->lock); >> + fi->nlookup++; >> + spin_unlock(&fc->lock); >> + >> + ret = 0; >> +out: >> + if (dentry) >> + dput(dentry); >> + if (alias) >> + dput(alias); >> + return ret; >> +} >> + >> +static int parse_dirplusfile(char *buf, size_t nbytes, struct file *file, >> + void *dstbuf, filldir_t filldir) >> +{ >> + struct fuse_direntp *direntp; >> + size_t reclen; >> + int over = 0; >> + int ret; >> + >> + while (nbytes >= FUSE_NAME_OFFSET(direntp)) { >> + direntp = (struct fuse_direntp *) buf; >> + reclen = FUSE_DIRENT_SIZE(direntp); >> + >> + if (!direntp->namelen || direntp->namelen > FUSE_NAME_MAX) >> + return -EIO; >> + if (reclen > nbytes) >> + break; >> + >> + if (!over) { >> + /* We fill entries into dstbuf only as much as >> + it can hold. But we still continue iterating >> + over remaining entries to link them. If not, >> + we need to send a FORGET for each of those >> + which we did not link. >> + */ >> + over = filldir(dstbuf, direntp->name, direntp->namelen, >> + file->f_pos, direntp->ino, >> + direntp->type); >> + file->f_pos = direntp->off; >> + } >> + >> + buf += reclen; >> + nbytes -= reclen; >> + >> + ret = fuse_direntp_link(file->f_path.dentry, direntp); >> + if (ret) { >> + struct fuse_conn *fc; >> + fc = get_fuse_conn(file->f_path.dentry->d_inode); >> + fuse_alloc_and_forget(fc, direntp->feo.nodeid); >> + } >> + } >> + >> + return 0; >> +} >> + >> static int fuse_readdir(struct file *file, void *dstbuf, filldir_t filldir) >> { >> int err; >> @@ -1142,14 +1291,24 @@ static int fuse_readdir(struct file *file, void *dstbuf, filldir_t filldir) >> req->out.argpages = 1; >> req->num_pages = 1; >> req->pages[0] = page; >> - fuse_read_fill(req, file, file->f_pos, PAGE_SIZE, FUSE_READDIR); >> + if (fc->do_readdirplus) >> + fuse_read_fill(req, file, file->f_pos, PAGE_SIZE, >> + FUSE_READDIRPLUS); >> + else >> + fuse_read_fill(req, file, file->f_pos, PAGE_SIZE, >> + FUSE_READDIR); >> fuse_request_send(fc, req); >> nbytes = req->out.args[0].size; >> err = req->out.h.error; >> fuse_put_request(fc, req); >> - if (!err) >> - err = parse_dirfile(page_address(page), nbytes, file, dstbuf, >> - filldir); >> + if (!err) { >> + if (fc->do_readdirplus) >> + err = parse_dirplusfile(page_address(page), nbytes, >> + file, dstbuf, filldir); >> + else >> + err = parse_dirfile(page_address(page), nbytes, file, >> + dstbuf, filldir); >> + } >> >> __free_page(page); >> fuse_invalidate_attr(inode); /* atime changed */ >> diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h >> index 771fb63..2ed4259 100644 >> --- a/fs/fuse/fuse_i.h >> +++ b/fs/fuse/fuse_i.h >> @@ -484,6 +484,9 @@ struct fuse_conn { >> /** Is fallocate not implemented by fs? */ >> unsigned no_fallocate:1; >> >> + /** Does the filesystem support readdir-plus? */ >> + unsigned do_readdirplus:1; >> + >> /** The number of requests waiting for completion */ >> atomic_t num_waiting; >> >> diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c >> index 1cd6165..5a10c8c 100644 >> --- a/fs/fuse/inode.c >> +++ b/fs/fuse/inode.c >> @@ -834,6 +834,8 @@ static void process_init_reply(struct fuse_conn *fc, struct fuse_req *req) >> fc->big_writes = 1; >> if (arg->flags & FUSE_DONT_MASK) >> fc->dont_mask = 1; >> + if (arg->flags & FUSE_DO_READDIRPLUS) >> + fc->do_readdirplus = 1; >> } else { >> ra_pages = fc->max_read / PAGE_CACHE_SIZE; >> fc->no_lock = 1; >> @@ -859,7 +861,7 @@ static void fuse_send_init(struct fuse_conn *fc, struct fuse_req *req) >> arg->max_readahead = fc->bdi.ra_pages * PAGE_CACHE_SIZE; >> arg->flags |= FUSE_ASYNC_READ | FUSE_POSIX_LOCKS | FUSE_ATOMIC_O_TRUNC | >> FUSE_EXPORT_SUPPORT | FUSE_BIG_WRITES | FUSE_DONT_MASK | >> - FUSE_FLOCK_LOCKS; >> + FUSE_FLOCK_LOCKS | FUSE_DO_READDIRPLUS; >> req->in.h.opcode = FUSE_INIT; >> req->in.numargs = 1; >> req->in.args[0].size = sizeof(*arg); >> diff --git a/include/linux/fuse.h b/include/linux/fuse.h >> index 9303348..3be2f39 100644 >> --- a/include/linux/fuse.h >> +++ b/include/linux/fuse.h >> @@ -175,6 +175,7 @@ struct fuse_file_lock { >> #define FUSE_EXPORT_SUPPORT (1 << 4) >> #define FUSE_BIG_WRITES (1 << 5) >> #define FUSE_DONT_MASK (1 << 6) >> +#define FUSE_DO_READDIRPLUS (1 << 7) >> #define FUSE_FLOCK_LOCKS (1 << 10) > > Please rebase against > > git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse.git for-next > >> >> /** >> @@ -282,6 +283,7 @@ enum fuse_opcode { >> FUSE_NOTIFY_REPLY = 41, >> FUSE_BATCH_FORGET = 42, >> FUSE_FALLOCATE = 43, >> + FUSE_READDIRPLUS = 44, >> >> /* CUSE specific operations */ >> CUSE_INIT = 4096, >> @@ -608,10 +610,19 @@ struct fuse_dirent { >> char name[]; >> }; >> >> -#define FUSE_NAME_OFFSET offsetof(struct fuse_dirent, name) >> +struct fuse_direntp { >> + __u64 ino; >> + __u64 off; >> + __u32 namelen; >> + __u32 type; >> + struct fuse_entry_out feo; >> + char name[]; >> +}; >> + >> +#define FUSE_NAME_OFFSET(d) offsetof(typeof(*d), name) > > FUSE_NAME_OFFSET (without an argument) is used by userspace code. So > please don't change it. Create a new define instead. OK. Thanks for the review. Avati >> #define FUSE_DIRENT_ALIGN(x) (((x) + sizeof(__u64) - 1) & ~(sizeof(__u64) - 1)) >> #define FUSE_DIRENT_SIZE(d) \ >> - FUSE_DIRENT_ALIGN(FUSE_NAME_OFFSET + (d)->namelen) >> + FUSE_DIRENT_ALIGN(FUSE_NAME_OFFSET(d) + (d)->namelen) >> >> struct fuse_notify_inval_inode_out { >> __u64 ino; > > > Thanks, > Miklos > |
From: Miklos S. <mi...@sz...> - 2012-08-08 15:27:50
|
Anand Avati <av...@re...> writes: >>> + dentry = d_lookup(parent, &name); >>> + if (dentry && dentry->d_inode) { >>> + inode = dentry->d_inode; >>> + if (get_node_id(inode) == direntp->feo.nodeid) { >>> + goto found; >>> + } else { >>> + fuse_alloc_and_forget(fc, get_node_id(inode)); >>> + d_drop(dentry); >> >> I see what you are trying to do here, but doing d_drop() like that is >> not a good idea. What you want here is more like the revalidate/ >> invalidate logic on lookup. > > OK. Do you mean we just do a d_invalidate() or fuse_invalidate_entry() of the > old dentry (instead of d_drop)? After looking at how dcache.c is handling > revalidates, it looked that that would be the only crux of the change. Yes. > It also appears that NFS seems to do a d_drop() in similar situation. Why is > FUSE different for this? Doing d_drop on tree with active references (e.g. submounts) may have undesirable consequences (can't umount because it's unreachable). NFS may be buggy. In its revalidate it checks submounts with have_submounts() before doing d_drop(). It doesn't appear to do that in readdirplus. But I haven't looked very deeply. Thanks, Miklos |
From: Stef B. <st...@gm...> - 2012-08-09 17:23:19
|
2012/7/19 Anand V. Avati <av...@re...>: > This patch implements readdirplus support in FUSE, similar to NFS. > The payload returned in the readdirplus call contains > 'fuse_entry_out' structure thereby providing all the necessary inputs > for 'faking' a lookup() operation on the spot. > Hi, it's good idea, but how is the lookup processed? Normally the readdir provides the list of ino's and the names, and the calling process (or is it the VFS?) does a lookup of all the individual inodes. I guess it's the VFS. And why is the decision taken to do a lookup? The timeout has expired? Stef |
From: Stef B. <st...@gm...> - 2012-08-14 08:13:09
|
Hi, isnt it a good idea to have a call which refreshes the attributes of an entry in the kernel cache. It's then possible to use this call for: . readdir plus constructions, where a readdir pushes the attributes to prevent a lookup . fs notify constructions, where the fuse fs detects a change on the backend implementation specific, and pushes the change to the vfs. There are several calls, but these are about delete/invalidate, not about a new right value. Stef 2012/8/9 Stef Bon <st...@gm...>: > 2012/7/19 Anand V. Avati <av...@re...>: >> This patch implements readdirplus support in FUSE, similar to NFS. >> The payload returned in the readdirplus call contains >> 'fuse_entry_out' structure thereby providing all the necessary inputs >> for 'faking' a lookup() operation on the spot. >> > > Hi, > > it's good idea, but how is the lookup processed? Normally the readdir > provides the list of ino's and the names, and the calling process (or > is it the VFS?) does a lookup of all the individual inodes. I guess > it's the VFS. And why is the decision taken to do a lookup? The > timeout has expired? > > Stef |
From: Anand A. <av...@re...> - 2012-08-14 09:14:24
|
On 08/14/2012 01:13 AM, Stef Bon wrote: > Hi, > > isnt it a good idea to have a call which refreshes the attributes of > an entry in the kernel cache. It's then possible to use this call for: > > . readdir plus constructions, where a readdir pushes the attributes to > prevent a lookup Wouldn't that involve one "push" call to update the attribute of one entry? The readdirplus approach refreshes attributes of all the returned entries with the same reply. > . fs notify constructions, where the fuse fs detects a change on the > backend implementation specific, and pushes the change to the vfs. This is a possibility, but I consider it outside the scope of this patch - as a separate effort. The strategy to use such a call is more complex and FS implementation specific, compared to readdirplus() which offers reasonable performance boost with a very little increase in complexity. > > 2012/8/9 Stef Bon <st...@gm...>: >> 2012/7/19 Anand V. Avati <av...@re...>: >>> This patch implements readdirplus support in FUSE, similar to NFS. >>> The payload returned in the readdirplus call contains >>> 'fuse_entry_out' structure thereby providing all the necessary inputs >>> for 'faking' a lookup() operation on the spot. >>> >> >> Hi, >> >> it's good idea, but how is the lookup processed? Normally the readdir >> provides the list of ino's and the names, and the calling process (or >> is it the VFS?) does a lookup of all the individual inodes. I guess >> it's the VFS. And why is the decision taken to do a lookup? The >> timeout has expired? A lookup is done out of necessity, as a dependency, to accomplish a different operation (the app's syscall). The process calling readdir() will still get only the list of inos and names. This patch assumes the application to perform *some* operation on each of the returned entries (like lstat(), lgetxattr() etc.) for which an implicit lookup might have been necessary. Because readdirplus makes the entry "lookup'ed", the syscall like getxattr() can be issued right away as an upcall before the entry/attribute timeout expires. Avati |
From: Stef B. <st...@gm...> - 2012-08-14 13:46:15
|
2012/8/14 Anand Avati <av...@re...>: > On 08/14/2012 01:13 AM, Stef Bon wrote: >> >> Hi, >> >> isnt it a good idea to have a call which refreshes the attributes of >> an entry in the kernel cache. It's then possible to use this call for: >> >> . readdir plus constructions, where a readdir pushes the attributes to >> prevent a lookup > > > Wouldn't that involve one "push" call to update the attribute of one entry? > The readdirplus approach refreshes attributes of all the returned entries > with the same reply. Yes that's true. But I think it's better still to have one call for refreshing the attributes without a request. > > >> . fs notify constructions, where the fuse fs detects a change on the >> backend implementation specific, and pushes the change to the vfs. > > > This is a possibility, but I consider it outside the scope of this patch - > as a separate effort. The strategy to use such a call is more complex and FS > implementation specific, compared to readdirplus() which offers reasonable > performance boost with a very little increase in complexity. I do not agree. It is not outside thescope, of your patch. I think in general that it's better to have one call which updates the attributes without a request for multiple purposes than different calls which are doing almost the same. Why is the "strategy to use such a call more complex", as you mention? > A lookup is done out of necessity, as a dependency, to accomplish a > different operation (the app's syscall). The process calling readdir() will > still get only the list of inos and names. This patch assumes the > application to perform *some* operation on each of the returned entries > (like lstat(), lgetxattr() etc.) for which an implicit lookup might have > been necessary. Because readdirplus makes the entry "lookup'ed", the syscall > like getxattr() can be issued right away as an upcall before the > entry/attribute timeout expires. > Stef |
From: Anand V. A. <av...@re...> - 2012-08-19 12:53:33
|
This patch implements readdirplus support in FUSE, similar to NFS. The payload returned in the readdirplus call contains 'fuse_entry_out' structure thereby providing all the necessary inputs for 'faking' a lookup() operation on the spot. If the dentry and inode already existed (for e.g. in a re-run of ls -l) then just the inode attributes timeout and dentry timeout are refreshed. With a simple client->network->server implementation of a FUSE based filesystem, the following performance observations were made: Test: Performing a filesystem crawl over 20,000 files with sh# time ls -lR /mnt Without readdirplus: Run 1: 18.1s Run 2: 16.0s Run 3: 16.2s With readdirplus: Run 1: 4.1s Run 2: 3.8s Run 3: 3.8s The performance improvement is significant as it avoided 20,000 upcalls calls (lookup). Cache consistency is no worse than what already is. Signed-off-by: Anand V. Avati <av...@re...> --- fs/fuse/dir.c | 178 ++++++++++++++++++++++++++++++++++++++++++++++++- fs/fuse/fuse_i.h | 3 + fs/fuse/inode.c | 5 +- include/linux/fuse.h | 11 +++ 4 files changed, 192 insertions(+), 5 deletions(-) diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c index 8964cf3..350a831 100644 --- a/fs/fuse/dir.c +++ b/fs/fuse/dir.c @@ -1152,6 +1152,166 @@ static int parse_dirfile(char *buf, size_t nbytes, struct file *file, return 0; } +void fuse_alloc_and_forget(struct file *file, u64 nodeid) +{ + struct inode *inode = file->f_path.dentry->d_inode; + struct fuse_conn *fc = get_fuse_conn(inode); + struct fuse_req *req; + struct fuse_forget_in inarg = { + .nlookup = 1, + }; + + req = fuse_get_req_nofail(fc, file); + req->in.h.opcode = FUSE_FORGET; + req->in.h.nodeid = nodeid; + req->in.numargs = 1; + req->in.args[0].size = sizeof(inarg); + req->in.args[0].value = &inarg; + req->force = 1; + fuse_request_send(fc, req); + fuse_put_request(fc, req); + return; +} + +static int fuse_direntp_link(struct file *file, struct fuse_direntp *direntp) +{ + int ret = -1; + struct fuse_entry_out *feo = &direntp->feo; + struct fuse_dirent *dirent = &direntp->dirent; + struct dentry *parent = file->f_path.dentry; + struct qstr name = QSTR_INIT(dirent->name, dirent->namelen); + struct dentry *dentry; + struct dentry *alias; + struct inode *dir = parent->d_inode; + struct fuse_conn *fc; + struct inode *inode; + u64 attr_version; + + if (!feo->nodeid) { + /* Unlike in the case of fuse_lookup, zero nodeid does + not mean ENOENT. Instead, it only means the userspace + filesystem did not want to return attributes/handle for + this entry. + + So do nothing. + */ + return 0; + } + + if (name.name[0] == '.') { + /* We could potentially refresh the attributes of the + directory and its parent? + */ + if (name.len == 1) + return 0; + if (name.name[1] == '.' && name.len == 2) + return 0; + } + fc = get_fuse_conn(dir); + + name.hash = full_name_hash(name.name, name.len); + dentry = d_lookup(parent, &name); + if (dentry && dentry->d_inode) { + inode = dentry->d_inode; + if (get_node_id(inode) == feo->nodeid) { + struct fuse_inode *fi; + fi = get_fuse_inode(inode); + spin_lock(&fc->lock); + fi->nlookup++; + spin_unlock(&fc->lock); + + /* The other branch to 'found' comes + via fuse_iget() which bumps nlookup inside + */ + goto found; + } else { + fuse_alloc_and_forget(file, get_node_id(inode)); + fuse_invalidate_entry(dentry); + dput(dentry); + dentry = NULL; + } + } + + dentry = d_alloc(parent, &name); + if (!dentry) + goto out; + + inode = fuse_iget(dir->i_sb, feo->nodeid, feo->generation, + &feo->attr, entry_attr_timeout(feo), + attr_version); + if (!inode || IS_ERR(inode)) + goto out; + + alias = d_materialise_unique(dentry, inode); + if (IS_ERR(alias)) + goto out; + if (alias) { + dput(dentry); + dentry = alias; + } + +found: + attr_version = fuse_get_attr_version(fc); + + fuse_change_attributes(inode, &feo->attr, + entry_attr_timeout(feo), + attr_version); + + fuse_change_entry_timeout(dentry, feo); + + ret = 0; +out: + if (dentry) + dput(dentry); + return ret; +} + +static int parse_dirplusfile(char *buf, size_t nbytes, struct file *file, + void *dstbuf, filldir_t filldir) +{ + struct fuse_direntp *direntp; + struct fuse_dirent *dirent; + size_t reclen; + int over = 0; + int ret; + + while (nbytes >= FUSE_NAME_OFFSET_DIRENTP) { + direntp = (struct fuse_direntp *) buf; + dirent = &direntp->dirent; + reclen = FUSE_DIRENTP_SIZE(direntp); + + if (!dirent->namelen || dirent->namelen > FUSE_NAME_MAX) + return -EIO; + if (reclen > nbytes) + break; + + if (!over) { + /* We fill entries into dstbuf only as much as + it can hold. But we still continue iterating + over remaining entries to link them. If not, + we need to send a FORGET for each of those + which we did not link. + */ + over = filldir(dstbuf, dirent->name, dirent->namelen, + file->f_pos, dirent->ino, + dirent->type); + file->f_pos = dirent->off; + } + + buf += reclen; + nbytes -= reclen; + + ret = fuse_direntp_link(file, direntp); + if (ret) { + struct fuse_conn *fc; + fc = get_fuse_conn(file->f_path.dentry->d_inode); + fuse_alloc_and_forget(file, direntp->feo.nodeid); + } + } + + return 0; +} + static int fuse_readdir(struct file *file, void *dstbuf, filldir_t filldir) { int err; @@ -1176,14 +1336,24 @@ static int fuse_readdir(struct file *file, void *dstbuf, filldir_t filldir) req->out.argpages = 1; req->num_pages = 1; req->pages[0] = page; - fuse_read_fill(req, file, file->f_pos, PAGE_SIZE, FUSE_READDIR); + if (fc->do_readdirplus) + fuse_read_fill(req, file, file->f_pos, PAGE_SIZE, + FUSE_READDIRPLUS); + else + fuse_read_fill(req, file, file->f_pos, PAGE_SIZE, + FUSE_READDIR); fuse_request_send(fc, req); nbytes = req->out.args[0].size; err = req->out.h.error; fuse_put_request(fc, req); - if (!err) - err = parse_dirfile(page_address(page), nbytes, file, dstbuf, - filldir); + if (!err) { + if (fc->do_readdirplus) + err = parse_dirplusfile(page_address(page), nbytes, + file, dstbuf, filldir); + else + err = parse_dirfile(page_address(page), nbytes, file, + dstbuf, filldir); + } __free_page(page); fuse_invalidate_attr(inode); /* atime changed */ diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index e24dd74..15e16e0 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -487,6 +487,9 @@ struct fuse_conn { /** Use enhanced/automatic page cache invalidation. */ unsigned auto_inval_data:1; + /** Does the filesystem support readdir-plus? */ + unsigned do_readdirplus:1; + /** The number of requests waiting for completion */ atomic_t num_waiting; diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c index ce0a283..dae933b 100644 --- a/fs/fuse/inode.c +++ b/fs/fuse/inode.c @@ -857,6 +857,8 @@ static void process_init_reply(struct fuse_conn *fc, struct fuse_req *req) fc->dont_mask = 1; if (arg->flags & FUSE_AUTO_INVAL_DATA) fc->auto_inval_data = 1; + if (arg->flags & FUSE_DO_READDIRPLUS) + fc->do_readdirplus = 1; } else { ra_pages = fc->max_read / PAGE_CACHE_SIZE; fc->no_lock = 1; @@ -883,7 +885,8 @@ static void fuse_send_init(struct fuse_conn *fc, struct fuse_req *req) arg->flags |= FUSE_ASYNC_READ | FUSE_POSIX_LOCKS | FUSE_ATOMIC_O_TRUNC | FUSE_EXPORT_SUPPORT | FUSE_BIG_WRITES | FUSE_DONT_MASK | FUSE_SPLICE_WRITE | FUSE_SPLICE_MOVE | FUSE_SPLICE_READ | - FUSE_FLOCK_LOCKS | FUSE_IOCTL_DIR | FUSE_AUTO_INVAL_DATA; + FUSE_FLOCK_LOCKS | FUSE_IOCTL_DIR | FUSE_AUTO_INVAL_DATA | + FUSE_DO_READDIRPLUS; req->in.h.opcode = FUSE_INIT; req->in.numargs = 1; req->in.args[0].size = sizeof(*arg); diff --git a/include/linux/fuse.h b/include/linux/fuse.h index d8c713e..fca0008 100644 --- a/include/linux/fuse.h +++ b/include/linux/fuse.h @@ -193,6 +193,7 @@ struct fuse_file_lock { #define FUSE_FLOCK_LOCKS (1 << 10) #define FUSE_HAS_IOCTL_DIR (1 << 11) #define FUSE_AUTO_INVAL_DATA (1 << 12) +#define FUSE_DO_READDIRPLUS (1 << 13) /** * CUSE INIT request/reply flags @@ -299,6 +300,7 @@ enum fuse_opcode { FUSE_NOTIFY_REPLY = 41, FUSE_BATCH_FORGET = 42, FUSE_FALLOCATE = 43, + FUSE_READDIRPLUS = 44, /* CUSE specific operations */ CUSE_INIT = 4096, @@ -630,6 +632,15 @@ struct fuse_dirent { #define FUSE_DIRENT_SIZE(d) \ FUSE_DIRENT_ALIGN(FUSE_NAME_OFFSET + (d)->namelen) +struct fuse_direntp { + struct fuse_entry_out feo; + struct fuse_dirent dirent; +}; + +#define FUSE_NAME_OFFSET_DIRENTP offsetof(struct fuse_direntp, dirent.name) +#define FUSE_DIRENTP_SIZE(d) \ + FUSE_DIRENT_ALIGN(FUSE_NAME_OFFSET_DIRENTP + (d)->dirent.namelen) + struct fuse_notify_inval_inode_out { __u64 ino; __s64 off; -- 1.7.4.4 |
From: Anand A. <av...@re...> - 2012-08-19 13:15:30
|
This version of the patch - incorporates review comments from the previous submission - fixes nlookup leak on entries in the first readdirplus - cleaner fuse_direntp structure Thanks, Avati On 08/19/2012 05:53 AM, Anand V. Avati wrote: > This patch implements readdirplus support in FUSE, similar to NFS. > The payload returned in the readdirplus call contains > 'fuse_entry_out' structure thereby providing all the necessary inputs > for 'faking' a lookup() operation on the spot. > > If the dentry and inode already existed (for e.g. in a re-run of ls -l) > then just the inode attributes timeout and dentry timeout are refreshed. > > With a simple client->network->server implementation of a FUSE based > filesystem, the following performance observations were made: > > Test: Performing a filesystem crawl over 20,000 files with > > sh# time ls -lR /mnt > > Without readdirplus: > Run 1: 18.1s > Run 2: 16.0s > Run 3: 16.2s > > With readdirplus: > Run 1: 4.1s > Run 2: 3.8s > Run 3: 3.8s > > The performance improvement is significant as it avoided 20,000 upcalls > calls (lookup). Cache consistency is no worse than what already is. > > Signed-off-by: Anand V. Avati <av...@re...> > --- > fs/fuse/dir.c | 178 ++++++++++++++++++++++++++++++++++++++++++++++++- > fs/fuse/fuse_i.h | 3 + > fs/fuse/inode.c | 5 +- > include/linux/fuse.h | 11 +++ > 4 files changed, 192 insertions(+), 5 deletions(-) > > diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c > index 8964cf3..350a831 100644 > --- a/fs/fuse/dir.c > +++ b/fs/fuse/dir.c > @@ -1152,6 +1152,166 @@ static int parse_dirfile(char *buf, size_t nbytes, struct file *file, > return 0; > } > > +void fuse_alloc_and_forget(struct file *file, u64 nodeid) > +{ > + struct inode *inode = file->f_path.dentry->d_inode; > + struct fuse_conn *fc = get_fuse_conn(inode); > + struct fuse_req *req; > + struct fuse_forget_in inarg = { > + .nlookup = 1, > + }; > + > + req = fuse_get_req_nofail(fc, file); > + req->in.h.opcode = FUSE_FORGET; > + req->in.h.nodeid = nodeid; > + req->in.numargs = 1; > + req->in.args[0].size = sizeof(inarg); > + req->in.args[0].value = &inarg; > + req->force = 1; > + fuse_request_send(fc, req); > + fuse_put_request(fc, req); > + return; > +} > + > +static int fuse_direntp_link(struct file *file, struct fuse_direntp *direntp) > +{ > + int ret = -1; > + struct fuse_entry_out *feo = &direntp->feo; > + struct fuse_dirent *dirent = &direntp->dirent; > + struct dentry *parent = file->f_path.dentry; > + struct qstr name = QSTR_INIT(dirent->name, dirent->namelen); > + struct dentry *dentry; > + struct dentry *alias; > + struct inode *dir = parent->d_inode; > + struct fuse_conn *fc; > + struct inode *inode; > + u64 attr_version; > + > + if (!feo->nodeid) { > + /* Unlike in the case of fuse_lookup, zero nodeid does > + not mean ENOENT. Instead, it only means the userspace > + filesystem did not want to return attributes/handle for > + this entry. > + > + So do nothing. > + */ > + return 0; > + } > + > + if (name.name[0] == '.') { > + /* We could potentially refresh the attributes of the > + directory and its parent? > + */ > + if (name.len == 1) > + return 0; > + if (name.name[1] == '.' && name.len == 2) > + return 0; > + } > + fc = get_fuse_conn(dir); > + > + name.hash = full_name_hash(name.name, name.len); > + dentry = d_lookup(parent, &name); > + if (dentry && dentry->d_inode) { > + inode = dentry->d_inode; > + if (get_node_id(inode) == feo->nodeid) { > + struct fuse_inode *fi; > + fi = get_fuse_inode(inode); > + spin_lock(&fc->lock); > + fi->nlookup++; > + spin_unlock(&fc->lock); > + > + /* The other branch to 'found' comes > + via fuse_iget() which bumps nlookup inside > + */ > + goto found; > + } else { > + fuse_alloc_and_forget(file, get_node_id(inode)); > + fuse_invalidate_entry(dentry); > + dput(dentry); > + dentry = NULL; > + } > + } > + > + dentry = d_alloc(parent, &name); > + if (!dentry) > + goto out; > + > + inode = fuse_iget(dir->i_sb, feo->nodeid, feo->generation, > + &feo->attr, entry_attr_timeout(feo), > + attr_version); > + if (!inode || IS_ERR(inode)) > + goto out; > + > + alias = d_materialise_unique(dentry, inode); > + if (IS_ERR(alias)) > + goto out; > + if (alias) { > + dput(dentry); > + dentry = alias; > + } > + > +found: > + attr_version = fuse_get_attr_version(fc); > + > + fuse_change_attributes(inode, &feo->attr, > + entry_attr_timeout(feo), > + attr_version); > + > + fuse_change_entry_timeout(dentry, feo); > + > + ret = 0; > +out: > + if (dentry) > + dput(dentry); > + return ret; > +} > + > +static int parse_dirplusfile(char *buf, size_t nbytes, struct file *file, > + void *dstbuf, filldir_t filldir) > +{ > + struct fuse_direntp *direntp; > + struct fuse_dirent *dirent; > + size_t reclen; > + int over = 0; > + int ret; > + > + while (nbytes >= FUSE_NAME_OFFSET_DIRENTP) { > + direntp = (struct fuse_direntp *) buf; > + dirent = &direntp->dirent; > + reclen = FUSE_DIRENTP_SIZE(direntp); > + > + if (!dirent->namelen || dirent->namelen > FUSE_NAME_MAX) > + return -EIO; > + if (reclen > nbytes) > + break; > + > + if (!over) { > + /* We fill entries into dstbuf only as much as > + it can hold. But we still continue iterating > + over remaining entries to link them. If not, > + we need to send a FORGET for each of those > + which we did not link. > + */ > + over = filldir(dstbuf, dirent->name, dirent->namelen, > + file->f_pos, dirent->ino, > + dirent->type); > + file->f_pos = dirent->off; > + } > + > + buf += reclen; > + nbytes -= reclen; > + > + ret = fuse_direntp_link(file, direntp); > + if (ret) { > + struct fuse_conn *fc; > + fc = get_fuse_conn(file->f_path.dentry->d_inode); > + fuse_alloc_and_forget(file, direntp->feo.nodeid); > + } > + } > + > + return 0; > +} > + > static int fuse_readdir(struct file *file, void *dstbuf, filldir_t filldir) > { > int err; > @@ -1176,14 +1336,24 @@ static int fuse_readdir(struct file *file, void *dstbuf, filldir_t filldir) > req->out.argpages = 1; > req->num_pages = 1; > req->pages[0] = page; > - fuse_read_fill(req, file, file->f_pos, PAGE_SIZE, FUSE_READDIR); > + if (fc->do_readdirplus) > + fuse_read_fill(req, file, file->f_pos, PAGE_SIZE, > + FUSE_READDIRPLUS); > + else > + fuse_read_fill(req, file, file->f_pos, PAGE_SIZE, > + FUSE_READDIR); > fuse_request_send(fc, req); > nbytes = req->out.args[0].size; > err = req->out.h.error; > fuse_put_request(fc, req); > - if (!err) > - err = parse_dirfile(page_address(page), nbytes, file, dstbuf, > - filldir); > + if (!err) { > + if (fc->do_readdirplus) > + err = parse_dirplusfile(page_address(page), nbytes, > + file, dstbuf, filldir); > + else > + err = parse_dirfile(page_address(page), nbytes, file, > + dstbuf, filldir); > + } > > __free_page(page); > fuse_invalidate_attr(inode); /* atime changed */ > diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h > index e24dd74..15e16e0 100644 > --- a/fs/fuse/fuse_i.h > +++ b/fs/fuse/fuse_i.h > @@ -487,6 +487,9 @@ struct fuse_conn { > /** Use enhanced/automatic page cache invalidation. */ > unsigned auto_inval_data:1; > > + /** Does the filesystem support readdir-plus? */ > + unsigned do_readdirplus:1; > + > /** The number of requests waiting for completion */ > atomic_t num_waiting; > > diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c > index ce0a283..dae933b 100644 > --- a/fs/fuse/inode.c > +++ b/fs/fuse/inode.c > @@ -857,6 +857,8 @@ static void process_init_reply(struct fuse_conn *fc, struct fuse_req *req) > fc->dont_mask = 1; > if (arg->flags & FUSE_AUTO_INVAL_DATA) > fc->auto_inval_data = 1; > + if (arg->flags & FUSE_DO_READDIRPLUS) > + fc->do_readdirplus = 1; > } else { > ra_pages = fc->max_read / PAGE_CACHE_SIZE; > fc->no_lock = 1; > @@ -883,7 +885,8 @@ static void fuse_send_init(struct fuse_conn *fc, struct fuse_req *req) > arg->flags |= FUSE_ASYNC_READ | FUSE_POSIX_LOCKS | FUSE_ATOMIC_O_TRUNC | > FUSE_EXPORT_SUPPORT | FUSE_BIG_WRITES | FUSE_DONT_MASK | > FUSE_SPLICE_WRITE | FUSE_SPLICE_MOVE | FUSE_SPLICE_READ | > - FUSE_FLOCK_LOCKS | FUSE_IOCTL_DIR | FUSE_AUTO_INVAL_DATA; > + FUSE_FLOCK_LOCKS | FUSE_IOCTL_DIR | FUSE_AUTO_INVAL_DATA | > + FUSE_DO_READDIRPLUS; > req->in.h.opcode = FUSE_INIT; > req->in.numargs = 1; > req->in.args[0].size = sizeof(*arg); > diff --git a/include/linux/fuse.h b/include/linux/fuse.h > index d8c713e..fca0008 100644 > --- a/include/linux/fuse.h > +++ b/include/linux/fuse.h > @@ -193,6 +193,7 @@ struct fuse_file_lock { > #define FUSE_FLOCK_LOCKS (1 << 10) > #define FUSE_HAS_IOCTL_DIR (1 << 11) > #define FUSE_AUTO_INVAL_DATA (1 << 12) > +#define FUSE_DO_READDIRPLUS (1 << 13) > > /** > * CUSE INIT request/reply flags > @@ -299,6 +300,7 @@ enum fuse_opcode { > FUSE_NOTIFY_REPLY = 41, > FUSE_BATCH_FORGET = 42, > FUSE_FALLOCATE = 43, > + FUSE_READDIRPLUS = 44, > > /* CUSE specific operations */ > CUSE_INIT = 4096, > @@ -630,6 +632,15 @@ struct fuse_dirent { > #define FUSE_DIRENT_SIZE(d) \ > FUSE_DIRENT_ALIGN(FUSE_NAME_OFFSET + (d)->namelen) > > +struct fuse_direntp { > + struct fuse_entry_out feo; > + struct fuse_dirent dirent; > +}; > + > +#define FUSE_NAME_OFFSET_DIRENTP offsetof(struct fuse_direntp, dirent.name) > +#define FUSE_DIRENTP_SIZE(d) \ > + FUSE_DIRENT_ALIGN(FUSE_NAME_OFFSET_DIRENTP + (d)->dirent.namelen) > + > struct fuse_notify_inval_inode_out { > __u64 ino; > __s64 off; > |
From: Miklos S. <mi...@sz...> - 2012-08-22 12:04:07
|
Anand Avati <av...@re...> writes: > This version of the patch > > - incorporates review comments from the previous submission > - fixes nlookup leak on entries in the first readdirplus > - cleaner fuse_direntp structure Anand, Here's an updated version of the patch. It fixes some issues (real and cosmetic) with your latest patch. Can you please look it over and give it a test. Thanks, Miklos --- From: "Anand V. Avati" <av...@re...> Subject: fuse: implement NFS-like readdirplus support Date: Sun, 19 Aug 2012 08:53:23 -0400 This patch implements readdirplus support in FUSE, similar to NFS. The payload returned in the readdirplus call contains 'fuse_entry_out' structure thereby providing all the necessary inputs for 'faking' a lookup() operation on the spot. If the dentry and inode already existed (for e.g. in a re-run of ls -l) then just the inode attributes timeout and dentry timeout are refreshed. With a simple client->network->server implementation of a FUSE based filesystem, the following performance observations were made: Test: Performing a filesystem crawl over 20,000 files with sh# time ls -lR /mnt Without readdirplus: Run 1: 18.1s Run 2: 16.0s Run 3: 16.2s With readdirplus: Run 1: 4.1s Run 2: 3.8s Run 3: 3.8s The performance improvement is significant as it avoided 20,000 upcalls calls (lookup). Cache consistency is no worse than what already is. Signed-off-by: Anand V. Avati <av...@re...> Signed-off-by: Miklos Szeredi <msz...@su...> --- fs/fuse/dev.c | 19 ++++++ fs/fuse/dir.c | 160 +++++++++++++++++++++++++++++++++++++++++++++++++-- fs/fuse/fuse_i.h | 6 + fs/fuse/inode.c | 5 + include/linux/fuse.h | 12 +++ 5 files changed, 197 insertions(+), 5 deletions(-) Index: linux-2.6/fs/fuse/dir.c =================================================================== --- linux-2.6.orig/fs/fuse/dir.c 2012-08-21 14:06:45.000000000 +0200 +++ linux-2.6/fs/fuse/dir.c 2012-08-22 13:48:28.000000000 +0200 @@ -1155,6 +1155,143 @@ static int parse_dirfile(char *buf, size return 0; } +static int fuse_direntplus_link(struct file *file, + struct fuse_direntplus *direntplus, + u64 attr_version) +{ + int err; + struct fuse_entry_out *o = &direntplus->entry_out; + struct fuse_dirent *dirent = &direntplus->dirent; + struct dentry *parent = file->f_path.dentry; + struct qstr name = QSTR_INIT(dirent->name, dirent->namelen); + struct dentry *dentry; + struct dentry *alias; + struct inode *dir = parent->d_inode; + struct fuse_conn *fc; + struct inode *inode; + + if (!o->nodeid) { + /* + * Unlike in the case of fuse_lookup, zero nodeid does not mean + * ENOENT. Instead, it only means the userspace filesystem did + * not want to return attributes/handle for this entry. + * + * So do nothing. + */ + return 0; + } + + if (name.name[0] == '.') { + /* + * We could potentially refresh the attributes of the directory + * and its parent? + */ + if (name.len == 1) + return 0; + if (name.name[1] == '.' && name.len == 2) + return 0; + } + fc = get_fuse_conn(dir); + + name.hash = full_name_hash(name.name, name.len); + dentry = d_lookup(parent, &name); + if (dentry && dentry->d_inode) { + inode = dentry->d_inode; + if (get_node_id(inode) == o->nodeid) { + struct fuse_inode *fi; + fi = get_fuse_inode(inode); + spin_lock(&fc->lock); + fi->nlookup++; + spin_unlock(&fc->lock); + + /* + * The other branch to 'found' comes via fuse_iget() + * which bumps nlookup inside + */ + goto found; + } + err = d_invalidate(dentry); + if (err) + goto out; + dput(dentry); + dentry = NULL; + } + + dentry = d_alloc(parent, &name); + err = -ENOMEM; + if (!dentry) + goto out; + + inode = fuse_iget(dir->i_sb, o->nodeid, o->generation, + &o->attr, entry_attr_timeout(o), attr_version); + if (!inode) + goto out; + + alias = d_materialise_unique(dentry, inode); + err = PTR_ERR(alias); + if (IS_ERR(alias)) + goto out; + if (alias) { + dput(dentry); + dentry = alias; + } + +found: + fuse_change_attributes(inode, &o->attr, entry_attr_timeout(o), + attr_version); + + fuse_change_entry_timeout(dentry, o); + + err = 0; +out: + if (dentry) + dput(dentry); + return err; +} + +static int parse_dirplusfile(char *buf, size_t nbytes, struct file *file, + void *dstbuf, filldir_t filldir, u64 attr_version) +{ + struct fuse_direntplus *direntplus; + struct fuse_dirent *dirent; + size_t reclen; + int over = 0; + int ret; + + while (nbytes >= FUSE_NAME_OFFSET_DIRENTPLUS) { + direntplus = (struct fuse_direntplus *) buf; + dirent = &direntplus->dirent; + reclen = FUSE_DIRENTPLUS_SIZE(direntplus); + + if (!dirent->namelen || dirent->namelen > FUSE_NAME_MAX) + return -EIO; + if (reclen > nbytes) + break; + + if (!over) { + /* We fill entries into dstbuf only as much as + it can hold. But we still continue iterating + over remaining entries to link them. If not, + we need to send a FORGET for each of those + which we did not link. + */ + over = filldir(dstbuf, dirent->name, dirent->namelen, + file->f_pos, dirent->ino, + dirent->type); + file->f_pos = dirent->off; + } + + buf += reclen; + nbytes -= reclen; + + ret = fuse_direntplus_link(file, direntplus, attr_version); + if (ret) + fuse_force_forget(file, direntplus->entry_out.nodeid); + } + + return 0; +} + static int fuse_readdir(struct file *file, void *dstbuf, filldir_t filldir) { int err; @@ -1163,6 +1300,7 @@ static int fuse_readdir(struct file *fil struct inode *inode = file->f_path.dentry->d_inode; struct fuse_conn *fc = get_fuse_conn(inode); struct fuse_req *req; + u64 attr_version = 0; if (is_bad_inode(inode)) return -EIO; @@ -1179,14 +1317,28 @@ static int fuse_readdir(struct file *fil req->out.argpages = 1; req->num_pages = 1; req->pages[0] = page; - fuse_read_fill(req, file, file->f_pos, PAGE_SIZE, FUSE_READDIR); + if (fc->do_readdirplus) { + attr_version = fuse_get_attr_version(fc); + fuse_read_fill(req, file, file->f_pos, PAGE_SIZE, + FUSE_READDIRPLUS); + } else { + fuse_read_fill(req, file, file->f_pos, PAGE_SIZE, + FUSE_READDIR); + } fuse_request_send(fc, req); nbytes = req->out.args[0].size; err = req->out.h.error; fuse_put_request(fc, req); - if (!err) - err = parse_dirfile(page_address(page), nbytes, file, dstbuf, - filldir); + if (!err) { + if (fc->do_readdirplus) { + err = parse_dirplusfile(page_address(page), nbytes, + file, dstbuf, filldir, + attr_version); + } else { + err = parse_dirfile(page_address(page), nbytes, file, + dstbuf, filldir); + } + } __free_page(page); fuse_invalidate_attr(inode); /* atime changed */ Index: linux-2.6/fs/fuse/fuse_i.h =================================================================== --- linux-2.6.orig/fs/fuse/fuse_i.h 2012-08-21 14:06:45.000000000 +0200 +++ linux-2.6/fs/fuse/fuse_i.h 2012-08-22 12:16:39.000000000 +0200 @@ -487,6 +487,9 @@ struct fuse_conn { /** Use enhanced/automatic page cache invalidation. */ unsigned auto_inval_data:1; + /** Does the filesystem support readdir-plus? */ + unsigned do_readdirplus:1; + /** The number of requests waiting for completion */ atomic_t num_waiting; @@ -578,6 +581,9 @@ void fuse_queue_forget(struct fuse_conn struct fuse_forget_link *fuse_alloc_forget(void); +/* Used by READDIRPLUS */ +void fuse_force_forget(struct file *file, u64 nodeid); + /** * Initialize READ or READDIR request */ Index: linux-2.6/fs/fuse/inode.c =================================================================== --- linux-2.6.orig/fs/fuse/inode.c 2012-08-17 08:53:14.000000000 +0200 +++ linux-2.6/fs/fuse/inode.c 2012-08-22 12:02:29.000000000 +0200 @@ -857,6 +857,8 @@ static void process_init_reply(struct fu fc->dont_mask = 1; if (arg->flags & FUSE_AUTO_INVAL_DATA) fc->auto_inval_data = 1; + if (arg->flags & FUSE_DO_READDIRPLUS) + fc->do_readdirplus = 1; } else { ra_pages = fc->max_read / PAGE_CACHE_SIZE; fc->no_lock = 1; @@ -883,7 +885,8 @@ static void fuse_send_init(struct fuse_c arg->flags |= FUSE_ASYNC_READ | FUSE_POSIX_LOCKS | FUSE_ATOMIC_O_TRUNC | FUSE_EXPORT_SUPPORT | FUSE_BIG_WRITES | FUSE_DONT_MASK | FUSE_SPLICE_WRITE | FUSE_SPLICE_MOVE | FUSE_SPLICE_READ | - FUSE_FLOCK_LOCKS | FUSE_IOCTL_DIR | FUSE_AUTO_INVAL_DATA; + FUSE_FLOCK_LOCKS | FUSE_IOCTL_DIR | FUSE_AUTO_INVAL_DATA | + FUSE_DO_READDIRPLUS; req->in.h.opcode = FUSE_INIT; req->in.numargs = 1; req->in.args[0].size = sizeof(*arg); Index: linux-2.6/include/linux/fuse.h =================================================================== --- linux-2.6.orig/include/linux/fuse.h 2012-08-17 08:53:14.000000000 +0200 +++ linux-2.6/include/linux/fuse.h 2012-08-22 13:37:58.000000000 +0200 @@ -193,6 +193,7 @@ struct fuse_file_lock { #define FUSE_FLOCK_LOCKS (1 << 10) #define FUSE_HAS_IOCTL_DIR (1 << 11) #define FUSE_AUTO_INVAL_DATA (1 << 12) +#define FUSE_DO_READDIRPLUS (1 << 13) /** * CUSE INIT request/reply flags @@ -299,6 +300,7 @@ enum fuse_opcode { FUSE_NOTIFY_REPLY = 41, FUSE_BATCH_FORGET = 42, FUSE_FALLOCATE = 43, + FUSE_READDIRPLUS = 44, /* CUSE specific operations */ CUSE_INIT = 4096, @@ -630,6 +632,16 @@ struct fuse_dirent { #define FUSE_DIRENT_SIZE(d) \ FUSE_DIRENT_ALIGN(FUSE_NAME_OFFSET + (d)->namelen) +struct fuse_direntplus { + struct fuse_entry_out entry_out; + struct fuse_dirent dirent; +}; + +#define FUSE_NAME_OFFSET_DIRENTPLUS \ + offsetof(struct fuse_direntplus, dirent.name) +#define FUSE_DIRENTPLUS_SIZE(d) \ + FUSE_DIRENT_ALIGN(FUSE_NAME_OFFSET_DIRENTPLUS + (d)->dirent.namelen) + struct fuse_notify_inval_inode_out { __u64 ino; __s64 off; Index: linux-2.6/fs/fuse/dev.c =================================================================== --- linux-2.6.orig/fs/fuse/dev.c 2012-07-31 19:08:06.000000000 +0200 +++ linux-2.6/fs/fuse/dev.c 2012-08-22 13:33:20.000000000 +0200 @@ -492,6 +492,25 @@ void fuse_request_send_background_locked fuse_request_send_nowait_locked(fc, req); } +void fuse_force_forget(struct file *file, u64 nodeid) +{ + struct inode *inode = file->f_path.dentry->d_inode; + struct fuse_conn *fc = get_fuse_conn(inode); + struct fuse_req *req; + struct fuse_forget_in inarg; + + memset(&inarg, 0, sizeof(inarg)); + inarg.nlookup = 1; + req = fuse_get_req_nofail(fc, file); + req->in.h.opcode = FUSE_FORGET; + req->in.h.nodeid = nodeid; + req->in.numargs = 1; + req->in.args[0].size = sizeof(inarg); + req->in.args[0].value = &inarg; + req->isreply = 0; + fuse_request_send_nowait(fc, req); +} + /* * Lock the request. Up to the next unlock_request() there mustn't be * anything that could cause a page-fault. If the request was already |
From: Anand A. <av...@re...> - 2012-08-23 08:34:13
|
On 08/22/2012 05:05 AM, Miklos Szeredi wrote: > Anand Avati <av...@re...> writes: > >> This version of the patch >> >> - incorporates review comments from the previous submission >> - fixes nlookup leak on entries in the first readdirplus >> - cleaner fuse_direntp structure > > Anand, > > Here's an updated version of the patch. It fixes some issues (real and > cosmetic) with your latest patch. Can you please look it over and give > it a test. > > Thanks, > Miklos This patch works fine with all my tests. I could not really understand the attr_version related changes in your patch (hence not surprised if I had got it wrong in my patch), but the rest of your changes look fine. I also see how I was not using the _nowait() variant for submitting forget request. Thanks, Avati > > --- > From: "Anand V. Avati" <av...@re...> > Subject: fuse: implement NFS-like readdirplus support > Date: Sun, 19 Aug 2012 08:53:23 -0400 > > This patch implements readdirplus support in FUSE, similar to NFS. > The payload returned in the readdirplus call contains > 'fuse_entry_out' structure thereby providing all the necessary inputs > for 'faking' a lookup() operation on the spot. > > If the dentry and inode already existed (for e.g. in a re-run of ls -l) > then just the inode attributes timeout and dentry timeout are refreshed. > > With a simple client->network->server implementation of a FUSE based > filesystem, the following performance observations were made: > > Test: Performing a filesystem crawl over 20,000 files with > > sh# time ls -lR /mnt > > Without readdirplus: > Run 1: 18.1s > Run 2: 16.0s > Run 3: 16.2s > > With readdirplus: > Run 1: 4.1s > Run 2: 3.8s > Run 3: 3.8s > > The performance improvement is significant as it avoided 20,000 upcalls > calls (lookup). Cache consistency is no worse than what already is. > > Signed-off-by: Anand V. Avati <av...@re...> > Signed-off-by: Miklos Szeredi <msz...@su...> > --- > fs/fuse/dev.c | 19 ++++++ > fs/fuse/dir.c | 160 +++++++++++++++++++++++++++++++++++++++++++++++++-- > fs/fuse/fuse_i.h | 6 + > fs/fuse/inode.c | 5 + > include/linux/fuse.h | 12 +++ > 5 files changed, 197 insertions(+), 5 deletions(-) > > Index: linux-2.6/fs/fuse/dir.c > =================================================================== > --- linux-2.6.orig/fs/fuse/dir.c 2012-08-21 14:06:45.000000000 +0200 > +++ linux-2.6/fs/fuse/dir.c 2012-08-22 13:48:28.000000000 +0200 > @@ -1155,6 +1155,143 @@ static int parse_dirfile(char *buf, size > return 0; > } > > +static int fuse_direntplus_link(struct file *file, > + struct fuse_direntplus *direntplus, > + u64 attr_version) > +{ > + int err; > + struct fuse_entry_out *o = &direntplus->entry_out; > + struct fuse_dirent *dirent = &direntplus->dirent; > + struct dentry *parent = file->f_path.dentry; > + struct qstr name = QSTR_INIT(dirent->name, dirent->namelen); > + struct dentry *dentry; > + struct dentry *alias; > + struct inode *dir = parent->d_inode; > + struct fuse_conn *fc; > + struct inode *inode; > + > + if (!o->nodeid) { > + /* > + * Unlike in the case of fuse_lookup, zero nodeid does not mean > + * ENOENT. Instead, it only means the userspace filesystem did > + * not want to return attributes/handle for this entry. > + * > + * So do nothing. > + */ > + return 0; > + } > + > + if (name.name[0] == '.') { > + /* > + * We could potentially refresh the attributes of the directory > + * and its parent? > + */ > + if (name.len == 1) > + return 0; > + if (name.name[1] == '.' && name.len == 2) > + return 0; > + } > + fc = get_fuse_conn(dir); > + > + name.hash = full_name_hash(name.name, name.len); > + dentry = d_lookup(parent, &name); > + if (dentry && dentry->d_inode) { > + inode = dentry->d_inode; > + if (get_node_id(inode) == o->nodeid) { > + struct fuse_inode *fi; > + fi = get_fuse_inode(inode); > + spin_lock(&fc->lock); > + fi->nlookup++; > + spin_unlock(&fc->lock); > + > + /* > + * The other branch to 'found' comes via fuse_iget() > + * which bumps nlookup inside > + */ > + goto found; > + } > + err = d_invalidate(dentry); > + if (err) > + goto out; > + dput(dentry); > + dentry = NULL; > + } > + > + dentry = d_alloc(parent, &name); > + err = -ENOMEM; > + if (!dentry) > + goto out; > + > + inode = fuse_iget(dir->i_sb, o->nodeid, o->generation, > + &o->attr, entry_attr_timeout(o), attr_version); > + if (!inode) > + goto out; > + > + alias = d_materialise_unique(dentry, inode); > + err = PTR_ERR(alias); > + if (IS_ERR(alias)) > + goto out; > + if (alias) { > + dput(dentry); > + dentry = alias; > + } > + > +found: > + fuse_change_attributes(inode, &o->attr, entry_attr_timeout(o), > + attr_version); > + > + fuse_change_entry_timeout(dentry, o); > + > + err = 0; > +out: > + if (dentry) > + dput(dentry); > + return err; > +} > + > +static int parse_dirplusfile(char *buf, size_t nbytes, struct file *file, > + void *dstbuf, filldir_t filldir, u64 attr_version) > +{ > + struct fuse_direntplus *direntplus; > + struct fuse_dirent *dirent; > + size_t reclen; > + int over = 0; > + int ret; > + > + while (nbytes >= FUSE_NAME_OFFSET_DIRENTPLUS) { > + direntplus = (struct fuse_direntplus *) buf; > + dirent = &direntplus->dirent; > + reclen = FUSE_DIRENTPLUS_SIZE(direntplus); > + > + if (!dirent->namelen || dirent->namelen > FUSE_NAME_MAX) > + return -EIO; > + if (reclen > nbytes) > + break; > + > + if (!over) { > + /* We fill entries into dstbuf only as much as > + it can hold. But we still continue iterating > + over remaining entries to link them. If not, > + we need to send a FORGET for each of those > + which we did not link. > + */ > + over = filldir(dstbuf, dirent->name, dirent->namelen, > + file->f_pos, dirent->ino, > + dirent->type); > + file->f_pos = dirent->off; > + } > + > + buf += reclen; > + nbytes -= reclen; > + > + ret = fuse_direntplus_link(file, direntplus, attr_version); > + if (ret) > + fuse_force_forget(file, direntplus->entry_out.nodeid); > + } > + > + return 0; > +} > + > static int fuse_readdir(struct file *file, void *dstbuf, filldir_t filldir) > { > int err; > @@ -1163,6 +1300,7 @@ static int fuse_readdir(struct file *fil > struct inode *inode = file->f_path.dentry->d_inode; > struct fuse_conn *fc = get_fuse_conn(inode); > struct fuse_req *req; > + u64 attr_version = 0; > > if (is_bad_inode(inode)) > return -EIO; > @@ -1179,14 +1317,28 @@ static int fuse_readdir(struct file *fil > req->out.argpages = 1; > req->num_pages = 1; > req->pages[0] = page; > - fuse_read_fill(req, file, file->f_pos, PAGE_SIZE, FUSE_READDIR); > + if (fc->do_readdirplus) { > + attr_version = fuse_get_attr_version(fc); > + fuse_read_fill(req, file, file->f_pos, PAGE_SIZE, > + FUSE_READDIRPLUS); > + } else { > + fuse_read_fill(req, file, file->f_pos, PAGE_SIZE, > + FUSE_READDIR); > + } > fuse_request_send(fc, req); > nbytes = req->out.args[0].size; > err = req->out.h.error; > fuse_put_request(fc, req); > - if (!err) > - err = parse_dirfile(page_address(page), nbytes, file, dstbuf, > - filldir); > + if (!err) { > + if (fc->do_readdirplus) { > + err = parse_dirplusfile(page_address(page), nbytes, > + file, dstbuf, filldir, > + attr_version); > + } else { > + err = parse_dirfile(page_address(page), nbytes, file, > + dstbuf, filldir); > + } > + } > > __free_page(page); > fuse_invalidate_attr(inode); /* atime changed */ > Index: linux-2.6/fs/fuse/fuse_i.h > =================================================================== > --- linux-2.6.orig/fs/fuse/fuse_i.h 2012-08-21 14:06:45.000000000 +0200 > +++ linux-2.6/fs/fuse/fuse_i.h 2012-08-22 12:16:39.000000000 +0200 > @@ -487,6 +487,9 @@ struct fuse_conn { > /** Use enhanced/automatic page cache invalidation. */ > unsigned auto_inval_data:1; > > + /** Does the filesystem support readdir-plus? */ > + unsigned do_readdirplus:1; > + > /** The number of requests waiting for completion */ > atomic_t num_waiting; > > @@ -578,6 +581,9 @@ void fuse_queue_forget(struct fuse_conn > > struct fuse_forget_link *fuse_alloc_forget(void); > > +/* Used by READDIRPLUS */ > +void fuse_force_forget(struct file *file, u64 nodeid); > + > /** > * Initialize READ or READDIR request > */ > Index: linux-2.6/fs/fuse/inode.c > =================================================================== > --- linux-2.6.orig/fs/fuse/inode.c 2012-08-17 08:53:14.000000000 +0200 > +++ linux-2.6/fs/fuse/inode.c 2012-08-22 12:02:29.000000000 +0200 > @@ -857,6 +857,8 @@ static void process_init_reply(struct fu > fc->dont_mask = 1; > if (arg->flags & FUSE_AUTO_INVAL_DATA) > fc->auto_inval_data = 1; > + if (arg->flags & FUSE_DO_READDIRPLUS) > + fc->do_readdirplus = 1; > } else { > ra_pages = fc->max_read / PAGE_CACHE_SIZE; > fc->no_lock = 1; > @@ -883,7 +885,8 @@ static void fuse_send_init(struct fuse_c > arg->flags |= FUSE_ASYNC_READ | FUSE_POSIX_LOCKS | FUSE_ATOMIC_O_TRUNC | > FUSE_EXPORT_SUPPORT | FUSE_BIG_WRITES | FUSE_DONT_MASK | > FUSE_SPLICE_WRITE | FUSE_SPLICE_MOVE | FUSE_SPLICE_READ | > - FUSE_FLOCK_LOCKS | FUSE_IOCTL_DIR | FUSE_AUTO_INVAL_DATA; > + FUSE_FLOCK_LOCKS | FUSE_IOCTL_DIR | FUSE_AUTO_INVAL_DATA | > + FUSE_DO_READDIRPLUS; > req->in.h.opcode = FUSE_INIT; > req->in.numargs = 1; > req->in.args[0].size = sizeof(*arg); > Index: linux-2.6/include/linux/fuse.h > =================================================================== > --- linux-2.6.orig/include/linux/fuse.h 2012-08-17 08:53:14.000000000 +0200 > +++ linux-2.6/include/linux/fuse.h 2012-08-22 13:37:58.000000000 +0200 > @@ -193,6 +193,7 @@ struct fuse_file_lock { > #define FUSE_FLOCK_LOCKS (1 << 10) > #define FUSE_HAS_IOCTL_DIR (1 << 11) > #define FUSE_AUTO_INVAL_DATA (1 << 12) > +#define FUSE_DO_READDIRPLUS (1 << 13) > > /** > * CUSE INIT request/reply flags > @@ -299,6 +300,7 @@ enum fuse_opcode { > FUSE_NOTIFY_REPLY = 41, > FUSE_BATCH_FORGET = 42, > FUSE_FALLOCATE = 43, > + FUSE_READDIRPLUS = 44, > > /* CUSE specific operations */ > CUSE_INIT = 4096, > @@ -630,6 +632,16 @@ struct fuse_dirent { > #define FUSE_DIRENT_SIZE(d) \ > FUSE_DIRENT_ALIGN(FUSE_NAME_OFFSET + (d)->namelen) > > +struct fuse_direntplus { > + struct fuse_entry_out entry_out; > + struct fuse_dirent dirent; > +}; > + > +#define FUSE_NAME_OFFSET_DIRENTPLUS \ > + offsetof(struct fuse_direntplus, dirent.name) > +#define FUSE_DIRENTPLUS_SIZE(d) \ > + FUSE_DIRENT_ALIGN(FUSE_NAME_OFFSET_DIRENTPLUS + (d)->dirent.namelen) > + > struct fuse_notify_inval_inode_out { > __u64 ino; > __s64 off; > Index: linux-2.6/fs/fuse/dev.c > =================================================================== > --- linux-2.6.orig/fs/fuse/dev.c 2012-07-31 19:08:06.000000000 +0200 > +++ linux-2.6/fs/fuse/dev.c 2012-08-22 13:33:20.000000000 +0200 > @@ -492,6 +492,25 @@ void fuse_request_send_background_locked > fuse_request_send_nowait_locked(fc, req); > } > > +void fuse_force_forget(struct file *file, u64 nodeid) > +{ > + struct inode *inode = file->f_path.dentry->d_inode; > + struct fuse_conn *fc = get_fuse_conn(inode); > + struct fuse_req *req; > + struct fuse_forget_in inarg; > + > + memset(&inarg, 0, sizeof(inarg)); > + inarg.nlookup = 1; > + req = fuse_get_req_nofail(fc, file); > + req->in.h.opcode = FUSE_FORGET; > + req->in.h.nodeid = nodeid; > + req->in.numargs = 1; > + req->in.args[0].size = sizeof(inarg); > + req->in.args[0].value = &inarg; > + req->isreply = 0; > + fuse_request_send_nowait(fc, req); > +} > + > /* > * Lock the request. Up to the next unlock_request() there mustn't be > * anything that could cause a page-fault. If the request was already > |
From: Feng S. <ste...@gm...> - 2013-01-15 03:23:59
|
The original idea belows to Anand V. Avati: http://sourceforge.net/mailarchive/message.php?msg_id=29562240 I did some benchmarking and add the adaptive mechanism as in NFS. For the detail of the testing, please refer to: http://sourceforge.net/mailarchive/message.php?msg_id=30305348 In my setup, this patch set can give about 50% performance up with "ls -l" and without much penalty with pure "ls" because of the second patch. Rebased to 3.8-rc3. Feng Shuo (1): FUSE: Adapt readdirplus to application usage patterns Miklos Szeredi (1): fuse: implement NFS-like readdirplus support fs/fuse/dev.c | 19 +++++ fs/fuse/dir.c | 206 ++++++++++++++++++++++++++++++++++++++++++++-- fs/fuse/fuse_i.h | 16 ++++ fs/fuse/inode.c | 8 +- include/uapi/linux/fuse.h | 12 +++ 5 files changed, 254 insertions(+), 7 deletions(-) -- 1.7.12.4 |
From: Feng S. <ste...@gm...> - 2013-01-15 03:24:08
|
From: Miklos Szeredi <mi...@sz...> Anand Avati <av...@re...> writes: > This version of the patch > > - incorporates review comments from the previous submission > - fixes nlookup leak on entries in the first readdirplus > - cleaner fuse_direntp structure Anand, Here's an updated version of the patch. It fixes some issues (real and cosmetic) with your latest patch. Can you please look it over and give it a test. Thanks, Miklos Signed-off-by: Feng Shuo <ste...@gm...> --- fs/fuse/dev.c | 19 ++++++ fs/fuse/dir.c | 160 ++++++++++++++++++++++++++++++++++++++++++++-- fs/fuse/fuse_i.h | 6 ++ fs/fuse/inode.c | 5 +- include/uapi/linux/fuse.h | 12 ++++ 5 files changed, 197 insertions(+), 5 deletions(-) diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c index c163353..8e8b216 100644 --- a/fs/fuse/dev.c +++ b/fs/fuse/dev.c @@ -491,6 +491,25 @@ void fuse_request_send_background_locked(struct fuse_conn *fc, fuse_request_send_nowait_locked(fc, req); } +void fuse_force_forget(struct file *file, u64 nodeid) +{ + struct inode *inode = file->f_path.dentry->d_inode; + struct fuse_conn *fc = get_fuse_conn(inode); + struct fuse_req *req; + struct fuse_forget_in inarg; + + memset(&inarg, 0, sizeof(inarg)); + inarg.nlookup = 1; + req = fuse_get_req_nofail(fc, file); + req->in.h.opcode = FUSE_FORGET; + req->in.h.nodeid = nodeid; + req->in.numargs = 1; + req->in.args[0].size = sizeof(inarg); + req->in.args[0].value = &inarg; + req->isreply = 0; + fuse_request_send_nowait(fc, req); +} + /* * Lock the request. Up to the next unlock_request() there mustn't be * anything that could cause a page-fault. If the request was already diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c index b7c09f9..dcc1e52 100644 --- a/fs/fuse/dir.c +++ b/fs/fuse/dir.c @@ -1155,6 +1155,143 @@ static int parse_dirfile(char *buf, size_t nbytes, struct file *file, return 0; } +static int fuse_direntplus_link(struct file *file, + struct fuse_direntplus *direntplus, + u64 attr_version) +{ + int err; + struct fuse_entry_out *o = &direntplus->entry_out; + struct fuse_dirent *dirent = &direntplus->dirent; + struct dentry *parent = file->f_path.dentry; + struct qstr name = QSTR_INIT(dirent->name, dirent->namelen); + struct dentry *dentry; + struct dentry *alias; + struct inode *dir = parent->d_inode; + struct fuse_conn *fc; + struct inode *inode; + + if (!o->nodeid) { + /* + * Unlike in the case of fuse_lookup, zero nodeid does not mean + * ENOENT. Instead, it only means the userspace filesystem did + * not want to return attributes/handle for this entry. + * + * So do nothing. + */ + return 0; + } + + if (name.name[0] == '.') { + /* + * We could potentially refresh the attributes of the directory + * and its parent? + */ + if (name.len == 1) + return 0; + if (name.name[1] == '.' && name.len == 2) + return 0; + } + fc = get_fuse_conn(dir); + + name.hash = full_name_hash(name.name, name.len); + dentry = d_lookup(parent, &name); + if (dentry && dentry->d_inode) { + inode = dentry->d_inode; + if (get_node_id(inode) == o->nodeid) { + struct fuse_inode *fi; + fi = get_fuse_inode(inode); + spin_lock(&fc->lock); + fi->nlookup++; + spin_unlock(&fc->lock); + + /* + * The other branch to 'found' comes via fuse_iget() + * which bumps nlookup inside + */ + goto found; + } + err = d_invalidate(dentry); + if (err) + goto out; + dput(dentry); + dentry = NULL; + } + + dentry = d_alloc(parent, &name); + err = -ENOMEM; + if (!dentry) + goto out; + + inode = fuse_iget(dir->i_sb, o->nodeid, o->generation, + &o->attr, entry_attr_timeout(o), attr_version); + if (!inode) + goto out; + + alias = d_materialise_unique(dentry, inode); + err = PTR_ERR(alias); + if (IS_ERR(alias)) + goto out; + if (alias) { + dput(dentry); + dentry = alias; + } + +found: + fuse_change_attributes(inode, &o->attr, entry_attr_timeout(o), + attr_version); + + fuse_change_entry_timeout(dentry, o); + + err = 0; +out: + if (dentry) + dput(dentry); + return err; +} + +static int parse_dirplusfile(char *buf, size_t nbytes, struct file *file, + void *dstbuf, filldir_t filldir, u64 attr_version) +{ + struct fuse_direntplus *direntplus; + struct fuse_dirent *dirent; + size_t reclen; + int over = 0; + int ret; + + while (nbytes >= FUSE_NAME_OFFSET_DIRENTPLUS) { + direntplus = (struct fuse_direntplus *) buf; + dirent = &direntplus->dirent; + reclen = FUSE_DIRENTPLUS_SIZE(direntplus); + + if (!dirent->namelen || dirent->namelen > FUSE_NAME_MAX) + return -EIO; + if (reclen > nbytes) + break; + + if (!over) { + /* We fill entries into dstbuf only as much as + it can hold. But we still continue iterating + over remaining entries to link them. If not, + we need to send a FORGET for each of those + which we did not link. + */ + over = filldir(dstbuf, dirent->name, dirent->namelen, + file->f_pos, dirent->ino, + dirent->type); + file->f_pos = dirent->off; + } + + buf += reclen; + nbytes -= reclen; + + ret = fuse_direntplus_link(file, direntplus, attr_version); + if (ret) + fuse_force_forget(file, direntplus->entry_out.nodeid); + } + + return 0; +} + static int fuse_readdir(struct file *file, void *dstbuf, filldir_t filldir) { int err; @@ -1163,6 +1300,7 @@ static int fuse_readdir(struct file *file, void *dstbuf, filldir_t filldir) struct inode *inode = file->f_path.dentry->d_inode; struct fuse_conn *fc = get_fuse_conn(inode); struct fuse_req *req; + u64 attr_version = 0; if (is_bad_inode(inode)) return -EIO; @@ -1179,14 +1317,28 @@ static int fuse_readdir(struct file *file, void *dstbuf, filldir_t filldir) req->out.argpages = 1; req->num_pages = 1; req->pages[0] = page; - fuse_read_fill(req, file, file->f_pos, PAGE_SIZE, FUSE_READDIR); + if (fc->do_readdirplus) { + attr_version = fuse_get_attr_version(fc); + fuse_read_fill(req, file, file->f_pos, PAGE_SIZE, + FUSE_READDIRPLUS); + } else { + fuse_read_fill(req, file, file->f_pos, PAGE_SIZE, + FUSE_READDIR); + } fuse_request_send(fc, req); nbytes = req->out.args[0].size; err = req->out.h.error; fuse_put_request(fc, req); - if (!err) - err = parse_dirfile(page_address(page), nbytes, file, dstbuf, - filldir); + if (!err) { + if (fc->do_readdirplus) { + err = parse_dirplusfile(page_address(page), nbytes, + file, dstbuf, filldir, + attr_version); + } else { + err = parse_dirfile(page_address(page), nbytes, file, + dstbuf, filldir); + } + } __free_page(page); fuse_invalidate_attr(inode); /* atime changed */ diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index e105a53..5c50553 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -487,6 +487,9 @@ struct fuse_conn { /** Use enhanced/automatic page cache invalidation. */ unsigned auto_inval_data:1; + /** Does the filesystem support readdir-plus? */ + unsigned do_readdirplus:1; + /** The number of requests waiting for completion */ atomic_t num_waiting; @@ -578,6 +581,9 @@ void fuse_queue_forget(struct fuse_conn *fc, struct fuse_forget_link *forget, struct fuse_forget_link *fuse_alloc_forget(void); +/* Used by READDIRPLUS */ +void fuse_force_forget(struct file *file, u64 nodeid); + /** * Initialize READ or READDIR request */ diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c index 73ca6b7..6f7d574 100644 --- a/fs/fuse/inode.c +++ b/fs/fuse/inode.c @@ -863,6 +863,8 @@ static void process_init_reply(struct fuse_conn *fc, struct fuse_req *req) fc->dont_mask = 1; if (arg->flags & FUSE_AUTO_INVAL_DATA) fc->auto_inval_data = 1; + if (arg->flags & FUSE_DO_READDIRPLUS) + fc->do_readdirplus = 1; } else { ra_pages = fc->max_read / PAGE_CACHE_SIZE; fc->no_lock = 1; @@ -889,7 +891,8 @@ static void fuse_send_init(struct fuse_conn *fc, struct fuse_req *req) arg->flags |= FUSE_ASYNC_READ | FUSE_POSIX_LOCKS | FUSE_ATOMIC_O_TRUNC | FUSE_EXPORT_SUPPORT | FUSE_BIG_WRITES | FUSE_DONT_MASK | FUSE_SPLICE_WRITE | FUSE_SPLICE_MOVE | FUSE_SPLICE_READ | - FUSE_FLOCK_LOCKS | FUSE_IOCTL_DIR | FUSE_AUTO_INVAL_DATA; + FUSE_FLOCK_LOCKS | FUSE_IOCTL_DIR | FUSE_AUTO_INVAL_DATA | + FUSE_DO_READDIRPLUS; req->in.h.opcode = FUSE_INIT; req->in.numargs = 1; req->in.args[0].size = sizeof(*arg); diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h index d8c713e..5dc1fea 100644 --- a/include/uapi/linux/fuse.h +++ b/include/uapi/linux/fuse.h @@ -193,6 +193,7 @@ struct fuse_file_lock { #define FUSE_FLOCK_LOCKS (1 << 10) #define FUSE_HAS_IOCTL_DIR (1 << 11) #define FUSE_AUTO_INVAL_DATA (1 << 12) +#define FUSE_DO_READDIRPLUS (1 << 13) /** * CUSE INIT request/reply flags @@ -299,6 +300,7 @@ enum fuse_opcode { FUSE_NOTIFY_REPLY = 41, FUSE_BATCH_FORGET = 42, FUSE_FALLOCATE = 43, + FUSE_READDIRPLUS = 44, /* CUSE specific operations */ CUSE_INIT = 4096, @@ -630,6 +632,16 @@ struct fuse_dirent { #define FUSE_DIRENT_SIZE(d) \ FUSE_DIRENT_ALIGN(FUSE_NAME_OFFSET + (d)->namelen) +struct fuse_direntplus { + struct fuse_entry_out entry_out; + struct fuse_dirent dirent; +}; + +#define FUSE_NAME_OFFSET_DIRENTPLUS \ + offsetof(struct fuse_direntplus, dirent.name) +#define FUSE_DIRENTPLUS_SIZE(d) \ + FUSE_DIRENT_ALIGN(FUSE_NAME_OFFSET_DIRENTPLUS + (d)->dirent.namelen) + struct fuse_notify_inval_inode_out { __u64 ino; __s64 off; -- 1.7.12.4 |
From: Feng S. <ste...@gm...> - 2013-01-15 03:24:16
|
Use the same adaptive readdirplus mechanism as NFS: http://permalink.gmane.org/gmane.linux.nfs/49299 If the user space implementation wants to disable readdirplus temporarily, it could just return ENOTSUPP. Then kernel will recall it with readdir. Signed-off-by: Feng Shuo <ste...@gm...> --- fs/fuse/dir.c | 50 ++++++++++++++++++++++++++++++++++++++++++++++---- fs/fuse/fuse_i.h | 10 ++++++++++ fs/fuse/inode.c | 3 +++ 3 files changed, 59 insertions(+), 4 deletions(-) diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c index dcc1e52..b9975df 100644 --- a/fs/fuse/dir.c +++ b/fs/fuse/dir.c @@ -14,6 +14,34 @@ #include <linux/namei.h> #include <linux/slab.h> +static bool fuse_use_readdirplus(struct inode *dir, struct file *filp) +{ + struct fuse_conn *fc = get_fuse_conn(dir); + struct fuse_inode *fi = get_fuse_inode(dir); + + if (!fc->do_readdirplus) + return false; + if (test_and_clear_bit(FUSE_I_ADVISE_RDPLUS, &fi->state)) + return true; + if (filp->f_pos == 0) + return true; + return false; +} + +static void fuse_advise_use_readdirplus(struct inode *dir) +{ + struct fuse_inode *fi = get_fuse_inode(dir); + + set_bit(FUSE_I_ADVISE_RDPLUS, &fi->state); +} + +static void fuse_stop_use_readdirplus(struct inode *dir) +{ + struct fuse_inode *fi = get_fuse_inode(dir); + + clear_bit(FUSE_I_ADVISE_RDPLUS, &fi->state); +} + #if BITS_PER_LONG >= 64 static inline void fuse_dentry_settime(struct dentry *entry, u64 time) { @@ -219,6 +247,7 @@ static int fuse_dentry_revalidate(struct dentry *entry, unsigned int flags) attr_version); fuse_change_entry_timeout(entry, &outarg); } + fuse_advise_use_readdirplus(inode); return 1; } @@ -355,6 +384,7 @@ static struct dentry *fuse_lookup(struct inode *dir, struct dentry *entry, else fuse_invalidate_entry_cache(entry); + fuse_advise_use_readdirplus(dir); return newent; out_iput: @@ -1294,7 +1324,7 @@ static int parse_dirplusfile(char *buf, size_t nbytes, struct file *file, static int fuse_readdir(struct file *file, void *dstbuf, filldir_t filldir) { - int err; + int plus, err; size_t nbytes; struct page *page; struct inode *inode = file->f_path.dentry->d_inode; @@ -1314,10 +1344,13 @@ static int fuse_readdir(struct file *file, void *dstbuf, filldir_t filldir) fuse_put_request(fc, req); return -ENOMEM; } + + plus = fuse_use_readdirplus(inode, file); +read_again: req->out.argpages = 1; req->num_pages = 1; req->pages[0] = page; - if (fc->do_readdirplus) { + if (plus) { attr_version = fuse_get_attr_version(fc); fuse_read_fill(req, file, file->f_pos, PAGE_SIZE, FUSE_READDIRPLUS); @@ -1329,8 +1362,17 @@ static int fuse_readdir(struct file *file, void *dstbuf, filldir_t filldir) nbytes = req->out.args[0].size; err = req->out.h.error; fuse_put_request(fc, req); + if ((err == -ENOTSUPP) && plus) { + fuse_stop_use_readdirplus(inode); + plus = 0; + req = fuse_get_req(fc); + if (!IS_ERR(req)) + goto read_again; + err = PTR_ERR(req); + goto out; + } if (!err) { - if (fc->do_readdirplus) { + if (plus) { err = parse_dirplusfile(page_address(page), nbytes, file, dstbuf, filldir, attr_version); @@ -1339,7 +1381,7 @@ static int fuse_readdir(struct file *file, void *dstbuf, filldir_t filldir) dstbuf, filldir); } } - +out: __free_page(page); fuse_invalidate_attr(inode); /* atime changed */ return err; diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index 5c50553..e582603 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -103,6 +103,16 @@ struct fuse_inode { /** List of writepage requestst (pending or sent) */ struct list_head writepages; + + /** Miscellaneous bits describing inode state */ + unsigned long state; +}; + +/** FUSE inode state bits */ +enum { + /** Advise readdirplus */ + FUSE_I_ADVISE_RDPLUS, + /* FUSE_I_MTIME_UPDATED, */ }; struct fuse_conn; diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c index 6f7d574..4ba1cf5 100644 --- a/fs/fuse/inode.c +++ b/fs/fuse/inode.c @@ -255,6 +255,9 @@ static void fuse_init_inode(struct inode *inode, struct fuse_attr *attr) new_decode_dev(attr->rdev)); } else BUG(); + + /* The readdir_plus is disabled initially. */ + get_fuse_inode(inode)->state = 0; } int fuse_inode_eq(struct inode *inode, void *_nodeidp) -- 1.7.12.4 |
From: Miklos S. <mi...@sz...> - 2013-01-25 14:16:43
|
On Tue, Jan 15, 2013 at 4:23 AM, Feng Shuo <ste...@gm...> wrote: > Use the same adaptive readdirplus mechanism as NFS: > > http://permalink.gmane.org/gmane.linux.nfs/49299 OK, applied with minor updates. > If the user space implementation wants to disable readdirplus > temporarily, it could just return ENOTSUPP. Then kernel will > recall it with readdir. I removed this part. It may make sense, but it's not part of the adaptive readdirplus algorithm, so I'd like to see this as a separate patch. Thanks, Miklos > Signed-off-by: Feng Shuo <ste...@gm...> > --- > fs/fuse/dir.c | 50 ++++++++++++++++++++++++++++++++++++++++++++++---- > fs/fuse/fuse_i.h | 10 ++++++++++ > fs/fuse/inode.c | 3 +++ > 3 files changed, 59 insertions(+), 4 deletions(-) > > diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c > index dcc1e52..b9975df 100644 > --- a/fs/fuse/dir.c > +++ b/fs/fuse/dir.c > @@ -14,6 +14,34 @@ > #include <linux/namei.h> > #include <linux/slab.h> > > +static bool fuse_use_readdirplus(struct inode *dir, struct file *filp) > +{ > + struct fuse_conn *fc = get_fuse_conn(dir); > + struct fuse_inode *fi = get_fuse_inode(dir); > + > + if (!fc->do_readdirplus) > + return false; > + if (test_and_clear_bit(FUSE_I_ADVISE_RDPLUS, &fi->state)) > + return true; > + if (filp->f_pos == 0) > + return true; > + return false; > +} > + > +static void fuse_advise_use_readdirplus(struct inode *dir) > +{ > + struct fuse_inode *fi = get_fuse_inode(dir); > + > + set_bit(FUSE_I_ADVISE_RDPLUS, &fi->state); > +} > + > +static void fuse_stop_use_readdirplus(struct inode *dir) > +{ > + struct fuse_inode *fi = get_fuse_inode(dir); > + > + clear_bit(FUSE_I_ADVISE_RDPLUS, &fi->state); > +} > + > #if BITS_PER_LONG >= 64 > static inline void fuse_dentry_settime(struct dentry *entry, u64 time) > { > @@ -219,6 +247,7 @@ static int fuse_dentry_revalidate(struct dentry *entry, unsigned int flags) > attr_version); > fuse_change_entry_timeout(entry, &outarg); > } > + fuse_advise_use_readdirplus(inode); > return 1; > } > > @@ -355,6 +384,7 @@ static struct dentry *fuse_lookup(struct inode *dir, struct dentry *entry, > else > fuse_invalidate_entry_cache(entry); > > + fuse_advise_use_readdirplus(dir); > return newent; > > out_iput: > @@ -1294,7 +1324,7 @@ static int parse_dirplusfile(char *buf, size_t nbytes, struct file *file, > > static int fuse_readdir(struct file *file, void *dstbuf, filldir_t filldir) > { > - int err; > + int plus, err; > size_t nbytes; > struct page *page; > struct inode *inode = file->f_path.dentry->d_inode; > @@ -1314,10 +1344,13 @@ static int fuse_readdir(struct file *file, void *dstbuf, filldir_t filldir) > fuse_put_request(fc, req); > return -ENOMEM; > } > + > + plus = fuse_use_readdirplus(inode, file); > +read_again: > req->out.argpages = 1; > req->num_pages = 1; > req->pages[0] = page; > - if (fc->do_readdirplus) { > + if (plus) { > attr_version = fuse_get_attr_version(fc); > fuse_read_fill(req, file, file->f_pos, PAGE_SIZE, > FUSE_READDIRPLUS); > @@ -1329,8 +1362,17 @@ static int fuse_readdir(struct file *file, void *dstbuf, filldir_t filldir) > nbytes = req->out.args[0].size; > err = req->out.h.error; > fuse_put_request(fc, req); > + if ((err == -ENOTSUPP) && plus) { > + fuse_stop_use_readdirplus(inode); > + plus = 0; > + req = fuse_get_req(fc); > + if (!IS_ERR(req)) > + goto read_again; > + err = PTR_ERR(req); > + goto out; > + } > if (!err) { > - if (fc->do_readdirplus) { > + if (plus) { > err = parse_dirplusfile(page_address(page), nbytes, > file, dstbuf, filldir, > attr_version); > @@ -1339,7 +1381,7 @@ static int fuse_readdir(struct file *file, void *dstbuf, filldir_t filldir) > dstbuf, filldir); > } > } > - > +out: > __free_page(page); > fuse_invalidate_attr(inode); /* atime changed */ > return err; > diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h > index 5c50553..e582603 100644 > --- a/fs/fuse/fuse_i.h > +++ b/fs/fuse/fuse_i.h > @@ -103,6 +103,16 @@ struct fuse_inode { > > /** List of writepage requestst (pending or sent) */ > struct list_head writepages; > + > + /** Miscellaneous bits describing inode state */ > + unsigned long state; > +}; > + > +/** FUSE inode state bits */ > +enum { > + /** Advise readdirplus */ > + FUSE_I_ADVISE_RDPLUS, > + /* FUSE_I_MTIME_UPDATED, */ > }; > > struct fuse_conn; > diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c > index 6f7d574..4ba1cf5 100644 > --- a/fs/fuse/inode.c > +++ b/fs/fuse/inode.c > @@ -255,6 +255,9 @@ static void fuse_init_inode(struct inode *inode, struct fuse_attr *attr) > new_decode_dev(attr->rdev)); > } else > BUG(); > + > + /* The readdir_plus is disabled initially. */ > + get_fuse_inode(inode)->state = 0; > } > > int fuse_inode_eq(struct inode *inode, void *_nodeidp) > -- > 1.7.12.4 > > > ------------------------------------------------------------------------------ > Master SQL Server Development, Administration, T-SQL, SSAS, SSIS, SSRS > and more. Get SQL Server skills now (including 2012) with LearnDevNow - > 200+ hours of step-by-step video tutorials by Microsoft MVPs and experts. > SALE $99.99 this month only - learn more at: > http://p.sf.net/sfu/learnmore_122512 > _______________________________________________ > fuse-devel mailing list > fus...@li... > https://lists.sourceforge.net/lists/listinfo/fuse-devel |
From: Feng S. <ste...@gm...> - 2013-01-31 15:17:15
|
Miklos, Are you using the "for-next" branch of the following tree? http://git.kernel.org/?p=linux/kernel/git/mszeredi/fuse.git;a=summary I didn't see the first part applied. If not, I could rebase to that branch and send a patch set with the second part split out as a separated patch. On Fri, Jan 25, 2013 at 10:16 PM, Miklos Szeredi <mi...@sz...> wrote: > On Tue, Jan 15, 2013 at 4:23 AM, Feng Shuo <ste...@gm...> wrote: >> Use the same adaptive readdirplus mechanism as NFS: >> >> http://permalink.gmane.org/gmane.linux.nfs/49299 > > OK, applied with minor updates. > >> If the user space implementation wants to disable readdirplus >> temporarily, it could just return ENOTSUPP. Then kernel will >> recall it with readdir. > > I removed this part. It may make sense, but it's not part of the > adaptive readdirplus algorithm, so I'd like to see this as a separate > patch. > > Thanks, > Miklos > > > >> Signed-off-by: Feng Shuo <ste...@gm...> >> --- >> fs/fuse/dir.c | 50 ++++++++++++++++++++++++++++++++++++++++++++++---- >> fs/fuse/fuse_i.h | 10 ++++++++++ >> fs/fuse/inode.c | 3 +++ >> 3 files changed, 59 insertions(+), 4 deletions(-) >> >> diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c >> index dcc1e52..b9975df 100644 >> --- a/fs/fuse/dir.c >> +++ b/fs/fuse/dir.c >> @@ -14,6 +14,34 @@ >> #include <linux/namei.h> >> #include <linux/slab.h> >> >> +static bool fuse_use_readdirplus(struct inode *dir, struct file *filp) >> +{ >> + struct fuse_conn *fc = get_fuse_conn(dir); >> + struct fuse_inode *fi = get_fuse_inode(dir); >> + >> + if (!fc->do_readdirplus) >> + return false; >> + if (test_and_clear_bit(FUSE_I_ADVISE_RDPLUS, &fi->state)) >> + return true; >> + if (filp->f_pos == 0) >> + return true; >> + return false; >> +} >> + >> +static void fuse_advise_use_readdirplus(struct inode *dir) >> +{ >> + struct fuse_inode *fi = get_fuse_inode(dir); >> + >> + set_bit(FUSE_I_ADVISE_RDPLUS, &fi->state); >> +} >> + >> +static void fuse_stop_use_readdirplus(struct inode *dir) >> +{ >> + struct fuse_inode *fi = get_fuse_inode(dir); >> + >> + clear_bit(FUSE_I_ADVISE_RDPLUS, &fi->state); >> +} >> + >> #if BITS_PER_LONG >= 64 >> static inline void fuse_dentry_settime(struct dentry *entry, u64 time) >> { >> @@ -219,6 +247,7 @@ static int fuse_dentry_revalidate(struct dentry *entry, unsigned int flags) >> attr_version); >> fuse_change_entry_timeout(entry, &outarg); >> } >> + fuse_advise_use_readdirplus(inode); >> return 1; >> } >> >> @@ -355,6 +384,7 @@ static struct dentry *fuse_lookup(struct inode *dir, struct dentry *entry, >> else >> fuse_invalidate_entry_cache(entry); >> >> + fuse_advise_use_readdirplus(dir); >> return newent; >> >> out_iput: >> @@ -1294,7 +1324,7 @@ static int parse_dirplusfile(char *buf, size_t nbytes, struct file *file, >> >> static int fuse_readdir(struct file *file, void *dstbuf, filldir_t filldir) >> { >> - int err; >> + int plus, err; >> size_t nbytes; >> struct page *page; >> struct inode *inode = file->f_path.dentry->d_inode; >> @@ -1314,10 +1344,13 @@ static int fuse_readdir(struct file *file, void *dstbuf, filldir_t filldir) >> fuse_put_request(fc, req); >> return -ENOMEM; >> } >> + >> + plus = fuse_use_readdirplus(inode, file); >> +read_again: >> req->out.argpages = 1; >> req->num_pages = 1; >> req->pages[0] = page; >> - if (fc->do_readdirplus) { >> + if (plus) { >> attr_version = fuse_get_attr_version(fc); >> fuse_read_fill(req, file, file->f_pos, PAGE_SIZE, >> FUSE_READDIRPLUS); >> @@ -1329,8 +1362,17 @@ static int fuse_readdir(struct file *file, void *dstbuf, filldir_t filldir) >> nbytes = req->out.args[0].size; >> err = req->out.h.error; >> fuse_put_request(fc, req); >> + if ((err == -ENOTSUPP) && plus) { >> + fuse_stop_use_readdirplus(inode); >> + plus = 0; >> + req = fuse_get_req(fc); >> + if (!IS_ERR(req)) >> + goto read_again; >> + err = PTR_ERR(req); >> + goto out; >> + } >> if (!err) { >> - if (fc->do_readdirplus) { >> + if (plus) { >> err = parse_dirplusfile(page_address(page), nbytes, >> file, dstbuf, filldir, >> attr_version); >> @@ -1339,7 +1381,7 @@ static int fuse_readdir(struct file *file, void *dstbuf, filldir_t filldir) >> dstbuf, filldir); >> } >> } >> - >> +out: >> __free_page(page); >> fuse_invalidate_attr(inode); /* atime changed */ >> return err; >> diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h >> index 5c50553..e582603 100644 >> --- a/fs/fuse/fuse_i.h >> +++ b/fs/fuse/fuse_i.h >> @@ -103,6 +103,16 @@ struct fuse_inode { >> >> /** List of writepage requestst (pending or sent) */ >> struct list_head writepages; >> + >> + /** Miscellaneous bits describing inode state */ >> + unsigned long state; >> +}; >> + >> +/** FUSE inode state bits */ >> +enum { >> + /** Advise readdirplus */ >> + FUSE_I_ADVISE_RDPLUS, >> + /* FUSE_I_MTIME_UPDATED, */ >> }; >> >> struct fuse_conn; >> diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c >> index 6f7d574..4ba1cf5 100644 >> --- a/fs/fuse/inode.c >> +++ b/fs/fuse/inode.c >> @@ -255,6 +255,9 @@ static void fuse_init_inode(struct inode *inode, struct fuse_attr *attr) >> new_decode_dev(attr->rdev)); >> } else >> BUG(); >> + >> + /* The readdir_plus is disabled initially. */ >> + get_fuse_inode(inode)->state = 0; >> } >> >> int fuse_inode_eq(struct inode *inode, void *_nodeidp) >> -- >> 1.7.12.4 >> >> >> ------------------------------------------------------------------------------ >> Master SQL Server Development, Administration, T-SQL, SSAS, SSIS, SSRS >> and more. Get SQL Server skills now (including 2012) with LearnDevNow - >> 200+ hours of step-by-step video tutorials by Microsoft MVPs and experts. >> SALE $99.99 this month only - learn more at: >> http://p.sf.net/sfu/learnmore_122512 >> _______________________________________________ >> fuse-devel mailing list >> fus...@li... >> https://lists.sourceforge.net/lists/listinfo/fuse-devel -- Feng Shuo Tel: (86)10-59851155-2116 Fax: (86)10-59851155-2008 Tianjin Zhongke Blue Whale Information Technologies Co., Ltd 10th Floor, Tower A, The GATE building, No. 19 Zhong-guan-cun Avenue Haidian District, Beijing, China Postcode 100080 |
From: Miklos S. <mi...@sz...> - 2013-01-31 16:29:14
|
On Thu, Jan 31, 2013 at 4:16 PM, Feng Shuo <ste...@gm...> wrote: > Miklos, > > Are you using the "for-next" branch of the following tree? > > http://git.kernel.org/?p=linux/kernel/git/mszeredi/fuse.git;a=summary > > I didn't see the first part applied. If not, I could rebase to that > branch and send a patch set with the second part split out as a > separated patch. Committed and pushed now. Sorry. Thanks, Miklos |
From: Feng S. <ste...@gm...> - 2013-02-01 02:25:52
|
With this patch, if the userspace implementation wants to disable readdirplus temporarily, it can just return ENOTSUPP. Then kernel will recall it with normal readdir. Signed-off-by: Feng Shuo <ste...@gm...> --- fs/fuse/dir.c | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+) diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c index dc5e648..8d00265 100644 --- a/fs/fuse/dir.c +++ b/fs/fuse/dir.c @@ -35,6 +35,13 @@ static void fuse_advise_use_readdirplus(struct inode *dir) set_bit(FUSE_I_ADVISE_RDPLUS, &fi->state); } +static void fuse_stop_use_readdirplus(struct inode *dir) +{ + struct fuse_inode *fi = get_fuse_inode(dir); + + clear_bit(FUSE_I_ADVISE_RDPLUS, &fi->state); +} + #if BITS_PER_LONG >= 64 static inline void fuse_dentry_settime(struct dentry *entry, u64 time) { @@ -1335,6 +1342,7 @@ static int fuse_readdir(struct file *file, void *dstbuf, filldir_t filldir) } plus = fuse_use_readdirplus(inode, file); +read_again: req->out.argpages = 1; req->num_pages = 1; req->pages[0] = page; @@ -1351,6 +1359,19 @@ static int fuse_readdir(struct file *file, void *dstbuf, filldir_t filldir) nbytes = req->out.args[0].size; err = req->out.h.error; fuse_put_request(fc, req); + + /* + * Recall with normal readdir if the userspace wants to disable + * readdirplus temporarily. + */ + if ((err == -ENOTSUPP) && plus) { + fuse_stop_use_readdirplus(inode); + plus = 0; + req = fuse_get_req(fc, 1); + if (!IS_ERR(req)) + goto read_again; + err = PTR_ERR(req); + } if (!err) { if (plus) { err = parse_dirplusfile(page_address(page), nbytes, -- 1.7.12.4 |
From: Eric W. <nor...@yh...> - 2013-02-06 00:48:35
|
This switches the -o no_readdir_plus option to a tristate string: -o readdirplus=(never|always|adaptive) Telling the kernel to always use readdirplus is beneficial to filesystems (e.g. GlusterFS) where the cost to perform readdir and readdirplus are identical. The default remains "adaptive" (if supported). Signed-off-by: Eric Wong <nor...@yh...> --- This depends on several readdirplus-related patches posted to the list. I've tested this with my "fuse3" branch of fusedav[1] I've pushed my own fuse repository out containing dependent patches to ease review: The following changes since commit eca08beaf5a7b4121da27c2a927a6ecbb08a74bf: libfuse: fix fuse_get_context() in non fuse threads (2013-02-05 13:36:06 +0100) are available in the git repository at: git://bogomips.org/fuse fuse3 for you to fetch changes up to 4b0a856fcea1f438bc8dcb23e61998056c952e79: allow disabling adaptive readdirplus (2013-02-06 00:43:05 +0000) [1] git://bogomips.org/fusedav fuse3 ---------------------------------------------------------------- Eric Wong (2): implement readdir_plus support in high-level API allow disabling adaptive readdirplus Feng Shuo (2): Add '[no_]auto_inval_data' mount option. Readdirplus support for fuse_lowlevel_ops include/fuse_common.h | 1 + include/fuse_kernel.h | 1 + lib/fuse_i.h | 2 +- lib/fuse_lowlevel.c | 48 ++++++++++++++++++++++++++++++++++++++++++------ 4 files changed, 45 insertions(+), 7 deletions(-) diff --git a/include/fuse_common.h b/include/fuse_common.h index a86f048..987fe92 100644 --- a/include/fuse_common.h +++ b/include/fuse_common.h @@ -113,6 +113,7 @@ struct fuse_file_info { #define FUSE_CAP_IOCTL_DIR (1 << 11) #define FUSE_CAP_AUTO_INVAL_DATA (1 << 12) #define FUSE_CAP_READDIR_PLUS (1 << 13) +#define FUSE_CAP_ADAPT_READDIR_PLUS (1 << 14) /** * Ioctl flags diff --git a/include/fuse_kernel.h b/include/fuse_kernel.h index 303384f..58b2e82 100644 --- a/include/fuse_kernel.h +++ b/include/fuse_kernel.h @@ -212,6 +212,7 @@ struct fuse_file_lock { #define FUSE_FLOCK_LOCKS (1 << 10) #define FUSE_AUTO_INVAL_DATA (1 << 12) #define FUSE_DO_READDIRPLUS (1 << 13) +#define FUSE_ADAPT_READDIRPLUS (1 << 14) /** * CUSE INIT request/reply flags diff --git a/lib/fuse_i.h b/lib/fuse_i.h index 141eb3f..02426a9 100644 --- a/lib/fuse_i.h +++ b/lib/fuse_i.h @@ -73,7 +73,7 @@ struct fuse_ll { int no_splice_read; int auto_inval_data; int no_auto_inval_data; - int no_readdir_plus; + char *readdirplus_type; struct fuse_lowlevel_ops op; int got_init; struct cuse_data *cuse_data; diff --git a/lib/fuse_lowlevel.c b/lib/fuse_lowlevel.c index d4f1c49..385ea34 100644 --- a/lib/fuse_lowlevel.c +++ b/lib/fuse_lowlevel.c @@ -1826,6 +1826,8 @@ static void do_init(fuse_req_t req, fuse_ino_t nodeid, const void *inarg) f->conn.capable |= FUSE_CAP_AUTO_INVAL_DATA; if (arg->flags & FUSE_DO_READDIRPLUS) f->conn.capable |= FUSE_CAP_READDIR_PLUS; + if (arg->flags & FUSE_ADAPT_READDIRPLUS) + f->conn.capable |= FUSE_CAP_ADAPT_READDIR_PLUS; } else { f->conn.async_read = 0; f->conn.max_readahead = 0; @@ -1858,8 +1860,13 @@ static void do_init(fuse_req_t req, fuse_ino_t nodeid, const void *inarg) f->conn.want |= FUSE_CAP_BIG_WRITES; if (f->auto_inval_data) f->conn.want |= FUSE_CAP_AUTO_INVAL_DATA; - if (f->op.readdir_plus && !f->no_readdir_plus) - f->conn.want |= FUSE_CAP_READDIR_PLUS; + if (f->op.readdir_plus) { + if (strcmp(f->readdirplus_type, "adaptive") == 0) + f->conn.want |= FUSE_CAP_READDIR_PLUS + | FUSE_CAP_ADAPT_READDIR_PLUS; + if (strcmp(f->readdirplus_type, "always") == 0) + f->conn.want |= FUSE_CAP_READDIR_PLUS; + } if (bufsize < FUSE_MIN_READ_BUFFER) { fprintf(stderr, "fuse: warning: buffer size too small: %zu\n", @@ -1883,9 +1890,10 @@ static void do_init(fuse_req_t req, fuse_ino_t nodeid, const void *inarg) f->conn.want &= ~FUSE_CAP_SPLICE_MOVE; if (f->no_auto_inval_data) f->conn.want &= ~FUSE_CAP_AUTO_INVAL_DATA; - if (f->no_readdir_plus) + if (strcmp(f->readdirplus_type, "never") == 0) f->conn.want &= ~FUSE_CAP_READDIR_PLUS; - + if (strcmp(f->readdirplus_type, "always") == 0) + f->conn.want &= ~FUSE_CAP_ADAPT_READDIR_PLUS; if (f->conn.async_read || (f->conn.want & FUSE_CAP_ASYNC_READ)) outarg.flags |= FUSE_ASYNC_READ; if (f->conn.want & FUSE_CAP_POSIX_LOCKS) @@ -1904,6 +1912,8 @@ static void do_init(fuse_req_t req, fuse_ino_t nodeid, const void *inarg) outarg.flags |= FUSE_AUTO_INVAL_DATA; if (f->conn.want & FUSE_CAP_READDIR_PLUS) outarg.flags |= FUSE_DO_READDIRPLUS; + if (f->conn.want & FUSE_CAP_ADAPT_READDIR_PLUS) + outarg.flags |= FUSE_CAP_ADAPT_READDIR_PLUS; outarg.max_readahead = f->conn.max_readahead; outarg.max_write = f->conn.max_write; if (f->conn.proto_minor >= 13) { @@ -2529,7 +2539,7 @@ static const struct fuse_opt fuse_ll_opts[] = { { "no_splice_read", offsetof(struct fuse_ll, no_splice_read), 1}, { "auto_inval_data", offsetof(struct fuse_ll, auto_inval_data), 1}, { "no_auto_inval_data", offsetof(struct fuse_ll, no_auto_inval_data), 1}, - { "no_readdir_plus", offsetof(struct fuse_ll, no_readdir_plus), 1}, + { "readdirplus=%s", offsetof(struct fuse_ll, readdirplus_type), 1}, FUSE_OPT_KEY("max_read=", FUSE_OPT_KEY_DISCARD), FUSE_OPT_KEY("-h", KEY_HELP), FUSE_OPT_KEY("--help", KEY_HELP), @@ -2562,7 +2572,7 @@ static void fuse_ll_help(void) " -o [no_]splice_move move data while splicing to the fuse device\n" " -o [no_]splice_read use splice to read from the fuse device\n" " -o [no_]auto_inval_data use automatic kernel cache invalidation logic\n" -" -o [no_]readdir_plus use readdir_plus if possible.\n" +" -o readdirplus=S control readdirplus use (never|always|adaptive)\n" ); } @@ -2607,6 +2617,7 @@ static void fuse_ll_destroy(void *data) pthread_key_delete(f->pipe_key); pthread_mutex_destroy(&f->lock); free(f->cuse_data); + free(f->readdirplus_type); free(f); } @@ -2730,6 +2741,28 @@ static int fuse_ll_receive_buf(struct fuse_session *se, struct fuse_buf *buf, } #endif +static int fuse_readdirplus_validate(struct fuse_ll *f) +{ + static const char allowed[] = "never\0always\0adaptive\0"; + const char *cur; + + if (!f->readdirplus_type) { + f->readdirplus_type = strdup("adaptive"); + if (!f->readdirplus_type) { + fprintf(stderr, "fuse: failed to allocate memory\n"); + return -1; + } + } + + for (cur = allowed; *cur; cur = strchr(cur, 0) + 1) { + if (strcmp(f->readdirplus_type, cur) == 0) + return 0; + } + + fprintf(stderr, "invalid value for readdirplus=%s\n", + f->readdirplus_type); + return -1; +} struct fuse_session *fuse_lowlevel_new(struct fuse_args *args, const struct fuse_lowlevel_ops *op, @@ -2773,6 +2806,8 @@ struct fuse_session *fuse_lowlevel_new(struct fuse_args *args, if (fuse_opt_parse(args, f, fuse_ll_opts, fuse_ll_opt_proc) == -1) goto out_key_destroy; + if (fuse_readdirplus_validate(f) == -1) + goto out_key_destroy; if (f->debug) fprintf(stderr, "FUSE library version: %s\n", PACKAGE_VERSION); @@ -2794,6 +2829,7 @@ out_key_destroy: pthread_key_delete(f->pipe_key); out_free: pthread_mutex_destroy(&f->lock); + free(f->readdirplus_type); free(f); out: return NULL; -- 1.8.1.2.434.g9a6c84e |
From: Eric W. <nor...@yh...> - 2013-02-06 01:21:18
|
Eric Wong <nor...@yh...> wrote: > -" -o [no_]readdir_plus use readdir_plus if possible.\n" > +" -o readdirplus=S control readdirplus use (never|always|adaptive)\n" Btw, I've been confused all along by the inconsistency between "readdir_plus" vs "readdirplus" in the library patches posted. Since the kernel module and NFS seem to favor "readdirplus", maybe we should go with that in the library, too... |
From: Miklos S. <mi...@sz...> - 2013-02-06 17:53:08
|
On Wed, Feb 6, 2013 at 2:21 AM, Eric Wong <nor...@yh...> wrote: > Eric Wong <nor...@yh...> wrote: >> -" -o [no_]readdir_plus use readdir_plus if possible.\n" >> +" -o readdirplus=S control readdirplus use (never|always|adaptive)\n" > > Btw, I've been confused all along by the inconsistency between > "readdir_plus" vs "readdirplus" in the library patches posted. > > Since the kernel module and NFS seem to favor "readdirplus", maybe we > should go with that in the library, too... Yes, and please make the that consistent in the FUSE_CAP names as well. And make the option "readdirplus=(yes|no|auto)" And instead of the FUSE_CAP_ADAPT_READDIR_PLUS use FUSE_CAP_READDIRPLUS_AUTO. Thanks, Miklos |
From: Eric W. <nor...@yh...> - 2013-02-07 02:52:55
|
This series applies on top of Feng Shuo's earlier patches to the library: * Add '[no_]auto_inval_data' mount option. * Readdirplus support for fuse_lowlevel_ops And incorporates suggestions from Miklos Eric Wong (3): libfuse: rename "readdir_plus" to "readdirplus" libfuse: implement readdirplus for high-level API libfuse: allow disabling adaptive readdirplus include/fuse.h | 12 ++++ include/fuse_common.h | 3 +- include/fuse_kernel.h | 1 + include/fuse_lowlevel.h | 2 +- lib/fuse.c | 181 ++++++++++++++++++++++++++++++++++++------------ lib/fuse_i.h | 2 +- lib/fuse_lowlevel.c | 63 +++++++++++++---- lib/fuse_versionscript | 1 + 8 files changed, 204 insertions(+), 61 deletions(-) |
From: Miklos S. <mi...@sz...> - 2014-03-05 14:05:06
|
On Tue, Feb 26, 2013 at 11:57 AM, Eric Wong <nor...@yh...> wrote: > Miklos Szeredi <mi...@sz...> wrote: >> On Sun, Feb 17, 2013 at 2:25 AM, Feng Shuo <ste...@gm...> wrote: >> > On Fri, Feb 8, 2013 at 3:22 PM, Eric Wong <nor...@yh...> wrote: >> >> Miklos Szeredi <mi...@sz...> wrote: >> >>> On Thu, Feb 7, 2013 at 7:26 PM, Eric Wong <nor...@yh...> wrote: >> >>> > Miklos Szeredi <mi...@sz...> wrote: >> >>> >> On Thu, Feb 7, 2013 at 3:52 AM, Eric Wong <nor...@yh...> wrote: >> >>> >> > Existing FUSE filesystems (e.g. fusedav) with a complete >> >>> >> > "filler" function for their existing readdir implementation >> >>> >> > can simply reuse their readdir callback as the readdirplus >> >>> >> > callback. >> >>> >> >> >>> >> Why introduce a new callback at all? It's 100% backward compatible >> >>> >> with the old one, no? >> >>> > >> >>> > I was worried some filesystems would have bogus/uninitialized >> >>> > data in the rest of the stat buffer, since we only used part >> >>> > of that before. >> >>> >> >>> Okay, that's a valid concern. >> >>> >> >>> But with 3.0 we're breaking API/ABI compatibility, so what we can do >> >>> is rename readdir to readdirplus so it is clear that the behavior >> >>> changed. And then add a "flag_no_readdirplus" to fuse_operations, >> >>> indicating that the filesystem wants the old interface. >> >>> >> >>> Or it could be the other way round, leave the name readdir and add a >> >>> "flag_readdirplus_ok". >> >> >> >> Since we support adaptive readdirplus, we should expect some FSes to >> >> benefit from having a separate readdir and readdirplus implementation >> >> for performance. >> > >> > How about readdir_plus(...., plus)? I know it might not look good for >> > some point of view.... >> >> I think it makes sense, except then we really don't need the _plus in >> the function name since it's there in the extra argument. > > I'm not sure if exposing an extra argument to the public API is > a good idea, either. readdir already has 5 args (which seems like > a lot to remember). > > Perhaps if we go with 6 args, the last one should be a flags mask to > allow for future expansion: > > int (*readdir) (const char *, void *, fuse_fill_dir_t, off_t, > struct fuse_file_info *, unsigned int flags); > > Anyways, I shall defer to Miklos for API decisions. > I will be happy to implement whatever he chooses, though. > > (sorry, haven't had much time for FUSE hacking lately, hopefully > will have time this coming week/end) Just pushed out a readdirplus solution for the high-level API to the fuse git. It's based on Eric's patch, but not much of the original remains. Please have a look, test and comment. Thanks, Miklos |