From: G. U. <go...@ud...> - 2008-06-08 14:06:16
|
I just discovered the kernel module in the fuse package doesn't build any longer with kernel 2.6.25. Looking in the archive I see that this has been mentioned previously, and that as long as I do not need NFS exporting support, there's no reason to use the kernel module from the fuse distribution. I do need NFS exporting support. Does anyone have any advice what to do in that case? The regular fuse module that comes with the kernel 2.6.25 does still not seem to support it. |
From: Miklos S. <mi...@sz...> - 2008-06-16 09:22:41
|
> I just discovered the kernel module in the fuse package doesn't build > any longer with kernel 2.6.25. Looking in the archive I see that this > has been mentioned previously, and that as long as I do not need NFS > exporting support, there's no reason to use the kernel module from the > fuse distribution. > > I do need NFS exporting support. Does anyone have any advice what to > do in that case? You can - try 2.6.26-rc5-mm3, which already has NFS export support for fuse - apply the patches from http://lkml.org/lkml/2008/5/15/351 to 2.6.25 - wait for 2.6.27 ;) > The regular fuse module that comes with the kernel > 2.6.25 does still not seem to support it. Yes. If I have time before 2.6.27, I'll release fuse-2.7.4 with support for 2.6.25 and 2.6.26. Thanks, Miklos |
From: G. U. <go...@ud...> - 2008-06-16 21:03:19
|
Miklos Szeredi writes: > You can > > - try 2.6.26-rc5-mm3, which already has NFS export support for fuse > > - apply the patches from http://lkml.org/lkml/2008/5/15/351 to 2.6.25 > > - wait for 2.6.27 ;) Ok, I see. Number 3 indicates that the traditional human attitude to a problem: "ignore it and hope it goes away by itself" would actually work in this case. But I'll try to find time to give number 2 a try first. Thanks. |
From: Brian W. <ywa...@ho...> - 2008-06-17 01:02:43
|
Miklos, with 2.6.26-rc6 and the NFS patch plus the user land patch you posted,, all the writes are still 4k blocks. Is there anything need to be dobe to enable >4k write? Thanks Brian -------------------------------------------------- From: "Miklos Szeredi" <mi...@sz...> Sent: Monday, June 16, 2008 3:22 AM To: <go...@ud...> Cc: <fus...@li...> Subject: Re: [fuse-devel] fuse 2.7.3, kernel 2.6.25, and NFS >> I just discovered the kernel module in the fuse package doesn't build >> any longer with kernel 2.6.25. Looking in the archive I see that this >> has been mentioned previously, and that as long as I do not need NFS >> exporting support, there's no reason to use the kernel module from the >> fuse distribution. >> >> I do need NFS exporting support. Does anyone have any advice what to >> do in that case? > > You can > > - try 2.6.26-rc5-mm3, which already has NFS export support for fuse > > - apply the patches from http://lkml.org/lkml/2008/5/15/351 to 2.6.25 > > - wait for 2.6.27 ;) > >> The regular fuse module that comes with the kernel >> 2.6.25 does still not seem to support it. > > Yes. If I have time before 2.6.27, I'll release fuse-2.7.4 with > support for 2.6.25 and 2.6.26. > > Thanks, > Miklos > > ------------------------------------------------------------------------- > Check out the new SourceForge.net Marketplace. > It's the best place to buy or sell services for > just about anything Open Source. > http://sourceforge.net/services/buy/index.php > _______________________________________________ > fuse-devel mailing list > fus...@li... > https://lists.sourceforge.net/lists/listinfo/fuse-devel > |
From: Szabolcs S. <sz...@nt...> - 2008-06-17 02:07:52
|
On Mon, 16 Jun 2008, Brian Wang wrote: > with 2.6.26-rc6 and the NFS patch plus the user land patch you posted,, all > the writes are still 4k blocks. > > Is there anything need to be dobe to enable >4k write? You need the -o big_writes mount option: http://mercurial.creo.hu/repos/fuse-hg/index.cgi/rev/27197dfa6194 Szaka -- NTFS-3G: http://ntfs-3g.org |
From: Brian W. <ywa...@ho...> - 2008-06-17 03:35:36
|
Thanks very much. But still not right. and more problems. 1. used -o big_writes and noforget. Still see 4k writes. 2. On write, "dd if=/dev/zero of=/mnt/10g bs=1M count=1000" dropped 30-40% in throughput compare to CVS version (30-40MB/sec on 5 runs, compare to 50-70 with cvs version). CPU usage dropped about 40%. 3. On read, takes forever to read it back with dd command. it used to be much faster than write (100MB/sec via NFS). CPU usage increases dramatically , from 10-20% to 100-200% (shown with top). the maximum throughput I got so far is 40MB/sec, lowest is<10MB/sec. (I got 100MB/sec or so consistently with cvs version, kernel 2.6.24). This is no change with my own code. and I will try with fusexmp, I suspect it would be the same. I don't think this is what expected. Anybody see the same? or does the patch only works with mm kernel for now? After falling back to cvs version and kernel 2.6.24, everything works as before. And by the way, those are all part of my automated tests, which have been working well with the cvs version , the only problem is the usual Stale NFS handle problem. Thanks Brian. -------------------------------------------------- From: "Szabolcs Szakacsits" <sz...@nt...> Sent: Monday, June 16, 2008 8:10 PM To: "Brian Wang" <ywa...@ho...> Cc: "Miklos Szeredi" <mi...@sz...>; <fus...@li...> Subject: Re: enabling >4k writes (was: Re: fuse 2.7.3, kernel 2.6.25, and NFS) > > On Mon, 16 Jun 2008, Brian Wang wrote: > >> with 2.6.26-rc6 and the NFS patch plus the user land patch you posted,, >> all >> the writes are still 4k blocks. >> >> Is there anything need to be dobe to enable >4k write? > > You need the -o big_writes mount option: > http://mercurial.creo.hu/repos/fuse-hg/index.cgi/rev/27197dfa6194 > > Szaka > > -- > NTFS-3G: http://ntfs-3g.org > > |
From: Szabolcs S. <sz...@nt...> - 2008-06-17 09:48:19
|
On Mon, 16 Jun 2008, Brian Wang wrote: > Thanks very much. But still not right. and more problems. > > 1. used -o big_writes and noforget. Still see 4k writes. Hmmm, you need fuse cvs user space but it seems I can reproduce the problem. Szaka > 2. On write, "dd if=/dev/zero of=/mnt/10g bs=1M count=1000" dropped 30-40% > in throughput compare to CVS version (30-40MB/sec on 5 runs, compare to > 50-70 with cvs version). CPU usage dropped about 40%. > > 3. On read, takes forever to read it back with dd command. it used to be > much faster than write (100MB/sec via NFS). CPU usage increases dramatically > , from 10-20% to 100-200% (shown with top). the maximum throughput I got so > far is 40MB/sec, lowest is<10MB/sec. (I got 100MB/sec or so consistently > with cvs version, kernel 2.6.24). > > This is no change with my own code. and I will try with fusexmp, I suspect > it would be the same. > > I don't think this is what expected. Anybody see the same? or does the patch > only works with mm kernel for now? > > After falling back to cvs version and kernel 2.6.24, everything works as > before. And by the way, those are all part of my automated tests, which have > been working well with the cvs version , the only problem is the usual Stale > NFS handle problem. > > > Thanks > > Brian. > > -------------------------------------------------- > From: "Szabolcs Szakacsits" <sz...@nt...> > Sent: Monday, June 16, 2008 8:10 PM > To: "Brian Wang" <ywa...@ho...> > Cc: "Miklos Szeredi" <mi...@sz...>; <fus...@li...> > Subject: Re: enabling >4k writes (was: Re: fuse 2.7.3, kernel 2.6.25, and > NFS) > > > > > On Mon, 16 Jun 2008, Brian Wang wrote: > > > >> with 2.6.26-rc6 and the NFS patch plus the user land patch you posted,, > >> all > >> the writes are still 4k blocks. > >> > >> Is there anything need to be dobe to enable >4k write? > > > > You need the -o big_writes mount option: > > http://mercurial.creo.hu/repos/fuse-hg/index.cgi/rev/27197dfa6194 > > > > Szaka > > > > -- > > NTFS-3G: http://ntfs-3g.org > > > > > > ------------------------------------------------------------------------- > Check out the new SourceForge.net Marketplace. > It's the best place to buy or sell services for > just about anything Open Source. > http://sourceforge.net/services/buy/index.php > _______________________________________________ > fuse-devel mailing list > fus...@li... > https://lists.sourceforge.net/lists/listinfo/fuse-devel > -- NTFS-3G: http://ntfs-3g.org |
From: Miklos S. <mi...@sz...> - 2008-06-17 12:54:51
|
> > Thanks very much. But still not right. and more problems. > > > > 1. used -o big_writes and noforget. Still see 4k writes. > > Hmmm, you need fuse cvs user space but it seems I can reproduce the > problem. If the application is doing 4k writes, then fuse can't merge them into bigger requests. Usually it is possible to give a hint to the application about optimal write size by setting st_blksize in ->getattr(). Miklos |
From: Brian W. <ywa...@ho...> - 2008-06-17 13:51:49
|
I used dd with block size 1M, I will try to set blksize and see what happens. -------------------------------------------------- From: "Miklos Szeredi" <mi...@sz...> Sent: Tuesday, June 17, 2008 6:54 AM To: <sz...@nt...> Cc: <ywa...@ho...>; <fus...@li...>; <mi...@sz...> Subject: Re: [fuse-devel] enabling >4k writes (was: Re: fuse 2.7.3, kernel 2.6.25, and NFS) >> > Thanks very much. But still not right. and more problems. >> > >> > 1. used -o big_writes and noforget. Still see 4k writes. >> >> Hmmm, you need fuse cvs user space but it seems I can reproduce the >> problem. > > If the application is doing 4k writes, then fuse can't merge them into > bigger requests. Usually it is possible to give a hint to the > application about optimal write size by setting st_blksize in > ->getattr(). > > Miklos > |
From: Szabolcs S. <sz...@nt...> - 2008-06-17 14:51:47
|
On Tue, 17 Jun 2008, Brian Wang wrote: > I used dd with block size 1M, I used that too and checked that dd indeed uses 1M write blocks with strace. > I will try to set blksize and see what happens. I think you mean st_blksize. The 'blksize' FUSE option should be irrelevant. I tried now setting st_blksize to 1M too but the writes still use 4kB blocks. I use the git kernel with fuse cvs. Szaka > -------------------------------------------------- > From: "Miklos Szeredi" <mi...@sz...> > Sent: Tuesday, June 17, 2008 6:54 AM > To: <sz...@nt...> > Cc: <ywa...@ho...>; <fus...@li...>; > <mi...@sz...> > Subject: Re: [fuse-devel] enabling >4k writes (was: Re: fuse 2.7.3, kernel > 2.6.25, and NFS) > > >> > Thanks very much. But still not right. and more problems. > >> > > >> > 1. used -o big_writes and noforget. Still see 4k writes. > >> > >> Hmmm, you need fuse cvs user space but it seems I can reproduce the > >> problem. > > > > If the application is doing 4k writes, then fuse can't merge them into > > bigger requests. Usually it is possible to give a hint to the > > application about optimal write size by setting st_blksize in > > ->getattr(). > > > > Miklos > > > > ------------------------------------------------------------------------- > Check out the new SourceForge.net Marketplace. > It's the best place to buy or sell services for > just about anything Open Source. > http://sourceforge.net/services/buy/index.php > _______________________________________________ > fuse-devel mailing list > fus...@li... > https://lists.sourceforge.net/lists/listinfo/fuse-devel > -- NTFS-3G: http://ntfs-3g.org |
From: Miklos S. <mi...@sz...> - 2008-06-17 16:11:55
|
> > I used dd with block size 1M, > > I used that too and checked that dd indeed uses 1M write blocks > with strace. > > > I will try to set blksize and see what happens. > > I think you mean st_blksize. The 'blksize' FUSE option should be > irrelevant. > > I tried now setting st_blksize to 1M too but the writes still use > 4kB blocks. I use the git kernel with fuse cvs. It was a bug in the max_write (and max_read) calculations. I've submitted a fix. Thanks guys! Miklos |
From: Szabolcs S. <sz...@nt...> - 2008-06-17 18:59:14
|
On Tue, 17 Jun 2008, Miklos Szeredi wrote: > It was a bug in the max_write (and max_read) calculations. I've > submitted a fix. Thanks guys! It's ok now, thanks! NTFS-3G write performance 6x better on a T9300 and quick tests seem to show that it's at least as fast as ext3 (but this needs more testing, etc). Szaka -- NTFS-3G: http://ntfs-3g.org |
From: Szabolcs S. <sz...@nt...> - 2008-06-29 15:19:07
|
On Tue, 17 Jun 2008, Szabolcs Szakacsits wrote: > NTFS-3G write performance 6x better on a T9300 and quick tests seem to show > that it's at least as fast as ext3 (but this needs more testing, etc). More detailed performance results. All numbers are in MB/sec. Please note that the current scalability limit is at 128 kB block size. tmpfs tmpfs block tmpfs loop loop ramdisk ramdisk blkdev blkdev size tmpfs ntfs-3g ext3 ntfs-3g ntfs-3g ext3 ntfs-3g ext3 512 421 16 182 16 15 170 16 128 1k 613 32 231 29 28 249 31 287 2k 775 52 260 51 49 338 57 393 4k 898 88 338 74 89 446 98 545 8k 949 185 352 103 152 468 174 579 16k 973 255 358 121 233 488 289 593 32k 964 388 375 171 326 486 395 603 64k 971 556 355 192 389 491 515 613 128k 977 687 369 185 444 494 665 621 256k 979 683 366 199 439 501 661 622 512k 979 685 376 201 432 499 665 618 1M 977 689 382 203 425 492 644 625 Thanks, Szaka -- NTFS-3G: http://ntfs-3g.org |
From: Brian W. <ywa...@ho...> - 2008-06-30 15:05:28
|
Sorry, my bad, should have looking into fuse/lib/modules. -------------------------------------------------- From: "Brian Wang" <ywa...@ho...> Sent: Monday, June 30, 2008 8:59 AM To: "Miklos Szeredi" <mi...@sz...> Cc: <fus...@li...> Subject: [fuse-devel] fuse/modules under cvs > Miklos, > > I just noticed that the files under fuse/modules in cvs were removed? Are > they deleted or moved somewhere else? > > Thanks > > brian > > > ------------------------------------------------------------------------- > Check out the new SourceForge.net Marketplace. > It's the best place to buy or sell services for > just about anything Open Source. > http://sourceforge.net/services/buy/index.php > _______________________________________________ > fuse-devel mailing list > fus...@li... > https://lists.sourceforge.net/lists/listinfo/fuse-devel > |
From: Brian W. <ywa...@ho...> - 2008-06-30 15:00:54
|
Miklos, I just noticed that the files under fuse/modules in cvs were removed? Are they deleted or moved somewhere else? Thanks brian |
From: Brian W. <ywa...@ho...> - 2008-06-17 01:59:28
|
Also, I believe a problem I have been running into has something to do with the NFS related patch. The reason is that when I used the CVS version for the same tests, it never happened for a very long time. 1 day after I switched to 2.6.26-rc6 with the NFS patch, I have constently run into it. my test case is very simple. Setup: kernel 2.6.26-rc6 with the NFS kernel patch copied from 2.6.26-rc5 mm2. userland with patch you posted a few days ago. Test: user a tar file generated from the compiled linux kernel tree, which is over 1GB, keep doing "tar xvf kernel.tar", or simply fo "md5sum". I run into input/putput error very often. The same NFS clients , same test with the cvs version never had the problem. Thanks. Brian -------------------------------------------------- From: "Brian Wang" <ywa...@ho...> Sent: Monday, June 16, 2008 7:00 PM To: "Miklos Szeredi" <mi...@sz...> Cc: <fus...@li...> Subject: Re: [fuse-devel] fuse 2.7.3, kernel 2.6.25, and NFS > Miklos, > > with 2.6.26-rc6 and the NFS patch plus the user land patch you posted,, > all > the writes are still 4k blocks. > > Is there anything need to be dobe to enable >4k write? > > Thanks > > Brian > > -------------------------------------------------- > From: "Miklos Szeredi" <mi...@sz...> > Sent: Monday, June 16, 2008 3:22 AM > To: <go...@ud...> > Cc: <fus...@li...> > Subject: Re: [fuse-devel] fuse 2.7.3, kernel 2.6.25, and NFS > >>> I just discovered the kernel module in the fuse package doesn't build >>> any longer with kernel 2.6.25. Looking in the archive I see that this >>> has been mentioned previously, and that as long as I do not need NFS >>> exporting support, there's no reason to use the kernel module from the >>> fuse distribution. >>> >>> I do need NFS exporting support. Does anyone have any advice what to >>> do in that case? >> >> You can >> >> - try 2.6.26-rc5-mm3, which already has NFS export support for fuse >> >> - apply the patches from http://lkml.org/lkml/2008/5/15/351 to 2.6.25 >> >> - wait for 2.6.27 ;) >> >>> The regular fuse module that comes with the kernel >>> 2.6.25 does still not seem to support it. >> >> Yes. If I have time before 2.6.27, I'll release fuse-2.7.4 with >> support for 2.6.25 and 2.6.26. >> >> Thanks, >> Miklos >> >> ------------------------------------------------------------------------- >> Check out the new SourceForge.net Marketplace. >> It's the best place to buy or sell services for >> just about anything Open Source. >> http://sourceforge.net/services/buy/index.php >> _______________________________________________ >> fuse-devel mailing list >> fus...@li... >> https://lists.sourceforge.net/lists/listinfo/fuse-devel >> > > ------------------------------------------------------------------------- > Check out the new SourceForge.net Marketplace. > It's the best place to buy or sell services for > just about anything Open Source. > http://sourceforge.net/services/buy/index.php > _______________________________________________ > fuse-devel mailing list > fus...@li... > https://lists.sourceforge.net/lists/listinfo/fuse-devel > |
From: Brian W. <ywa...@ho...> - 2008-06-16 16:18:06
|
I had lots of of problems with mm patches because of hardware. To avoid the mm patches, I tried to copy the fuse kernel code from 2.6.26 mm2 into 2.6.26.-rc6 (the files under fs/fuse/ and the header under include/linux/fuse.h). and it seems to work, at least it doesn't have the unrelated problems I ran into with mm patches. You need the userland patch Miklos posted before. Maybe Miklos can tell us if there are potential problems with copying the fuse code from mm patch into the main line tree? Thanks Brian -------------------------------------------------- From: "Miklos Szeredi" <mi...@sz...> Sent: Monday, June 16, 2008 3:22 AM To: <go...@ud...> Cc: <fus...@li...> Subject: Re: [fuse-devel] fuse 2.7.3, kernel 2.6.25, and NFS >> I just discovered the kernel module in the fuse package doesn't build >> any longer with kernel 2.6.25. Looking in the archive I see that this >> has been mentioned previously, and that as long as I do not need NFS >> exporting support, there's no reason to use the kernel module from the >> fuse distribution. >> >> I do need NFS exporting support. Does anyone have any advice what to >> do in that case? > > You can > > - try 2.6.26-rc5-mm3, which already has NFS export support for fuse > > - apply the patches from http://lkml.org/lkml/2008/5/15/351 to 2.6.25 > > - wait for 2.6.27 ;) > >> The regular fuse module that comes with the kernel >> 2.6.25 does still not seem to support it. > > Yes. If I have time before 2.6.27, I'll release fuse-2.7.4 with > support for 2.6.25 and 2.6.26. > > Thanks, > Miklos > > ------------------------------------------------------------------------- > Check out the new SourceForge.net Marketplace. > It's the best place to buy or sell services for > just about anything Open Source. > http://sourceforge.net/services/buy/index.php > _______________________________________________ > fuse-devel mailing list > fus...@li... > https://lists.sourceforge.net/lists/listinfo/fuse-devel > |
From: Miklos S. <mi...@sz...> - 2008-06-18 08:38:25
|
> Maybe Miklos can tell us if there are potential problems with copying the > fuse code from mm patch into the main line tree? Don't do it. You may occasionaly get a kernel that compiles, and may even appear to work, but generally it's a very very bad idea to mix code from different kernel versions. Here's a patch with all the nfs export patches folded into one, on top of 2.6.26-rc5. The -rc kernels (especially the late ones like this) should have much less bugs than the -mm kernels. Miklos --- fs/dlm/plock.c | 2 fs/fuse/dir.c | 139 +++++++++++++++++++++++++--------------- fs/fuse/file.c | 11 ++- fs/fuse/fuse_i.h | 10 ++ fs/fuse/inode.c | 177 ++++++++++++++++++++++++++++++++++++++++++++++++++- fs/lockd/clntproc.c | 10 ++ fs/lockd/svclock.c | 13 +-- fs/locks.c | 90 ++++++++++--------------- fs/nfsd/lockd.c | 13 ++- include/linux/fs.h | 6 + include/linux/fuse.h | 3 11 files changed, 355 insertions(+), 119 deletions(-) Index: linux-2.6.26-rc5/fs/lockd/clntproc.c =================================================================== --- linux-2.6.26-rc5.orig/fs/lockd/clntproc.c 2008-06-11 09:59:55.000000000 +0200 +++ linux-2.6.26-rc5/fs/lockd/clntproc.c 2008-06-11 10:14:30.000000000 +0200 @@ -580,7 +580,15 @@ again: } if (status < 0) goto out_unlock; - status = nlm_stat_to_errno(resp->status); + /* + * EAGAIN doesn't make sense for sleeping locks, and in some + * cases NLM_LCK_DENIED is returned for a permanent error. So + * turn it into an ENOLCK. + */ + if (resp->status == nlm_lck_denied && (fl_flags & FL_SLEEP)) + status = -ENOLCK; + else + status = nlm_stat_to_errno(resp->status); out_unblock: nlmclnt_finish_block(block); out: Index: linux-2.6.26-rc5/fs/nfsd/lockd.c =================================================================== --- linux-2.6.26-rc5.orig/fs/nfsd/lockd.c 2008-04-17 04:49:44.000000000 +0200 +++ linux-2.6.26-rc5/fs/nfsd/lockd.c 2008-06-11 10:14:30.000000000 +0200 @@ -19,6 +19,13 @@ #define NFSDDBG_FACILITY NFSDDBG_LOCKD +#ifdef CONFIG_LOCKD_V4 +#define nlm_stale_fh nlm4_stale_fh +#define nlm_failed nlm4_failed +#else +#define nlm_stale_fh nlm_lck_denied_nolocks +#define nlm_failed nlm_lck_denied_nolocks +#endif /* * Note: we hold the dentry use count while the file is open. */ @@ -47,12 +54,10 @@ nlm_fopen(struct svc_rqst *rqstp, struct return 0; case nfserr_dropit: return nlm_drop_reply; -#ifdef CONFIG_LOCKD_V4 case nfserr_stale: - return nlm4_stale_fh; -#endif + return nlm_stale_fh; default: - return nlm_lck_denied; + return nlm_failed; } } Index: linux-2.6.26-rc5/fs/dlm/plock.c =================================================================== --- linux-2.6.26-rc5.orig/fs/dlm/plock.c 2008-06-11 09:59:55.000000000 +0200 +++ linux-2.6.26-rc5/fs/dlm/plock.c 2008-06-11 10:14:30.000000000 +0200 @@ -116,7 +116,7 @@ int dlm_posix_lock(dlm_lockspace_t *lock if (xop->callback == NULL) wait_event(recv_wq, (op->done != 0)); else { - rv = -EINPROGRESS; + rv = FILE_LOCK_DEFERRED; goto out; } Index: linux-2.6.26-rc5/fs/lockd/svclock.c =================================================================== --- linux-2.6.26-rc5.orig/fs/lockd/svclock.c 2008-06-11 09:59:56.000000000 +0200 +++ linux-2.6.26-rc5/fs/lockd/svclock.c 2008-06-11 10:14:30.000000000 +0200 @@ -423,8 +423,8 @@ nlmsvc_lock(struct svc_rqst *rqstp, stru goto out; case -EAGAIN: ret = nlm_lck_denied; - break; - case -EINPROGRESS: + goto out; + case FILE_LOCK_DEFERRED: if (wait) break; /* Filesystem lock operation is in progress @@ -439,10 +439,6 @@ nlmsvc_lock(struct svc_rqst *rqstp, stru goto out; } - ret = nlm_lck_denied; - if (!wait) - goto out; - ret = nlm_lck_blocked; /* Append to list of blocked */ @@ -520,7 +516,7 @@ nlmsvc_testlock(struct svc_rqst *rqstp, } error = vfs_test_lock(file->f_file, &lock->fl); - if (error == -EINPROGRESS) { + if (error == FILE_LOCK_DEFERRED) { ret = nlmsvc_defer_lock_rqst(rqstp, block); goto out; } @@ -744,8 +740,7 @@ nlmsvc_grant_blocked(struct nlm_block *b switch (error) { case 0: break; - case -EAGAIN: - case -EINPROGRESS: + case FILE_LOCK_DEFERRED: dprintk("lockd: lock still blocked error %d\n", error); nlmsvc_insert_block(block, NLM_NEVER); nlmsvc_release_block(block); Index: linux-2.6.26-rc5/fs/locks.c =================================================================== --- linux-2.6.26-rc5.orig/fs/locks.c 2008-06-11 09:59:56.000000000 +0200 +++ linux-2.6.26-rc5/fs/locks.c 2008-06-11 10:14:30.000000000 +0200 @@ -785,8 +785,10 @@ find_conflict: if (!flock_locks_conflict(request, fl)) continue; error = -EAGAIN; - if (request->fl_flags & FL_SLEEP) - locks_insert_block(fl, request); + if (!(request->fl_flags & FL_SLEEP)) + goto out; + error = FILE_LOCK_DEFERRED; + locks_insert_block(fl, request); goto out; } if (request->fl_flags & FL_ACCESS) @@ -842,7 +844,7 @@ static int __posix_lock_file(struct inod error = -EDEADLK; if (posix_locks_deadlock(request, fl)) goto out; - error = -EAGAIN; + error = FILE_LOCK_DEFERRED; locks_insert_block(fl, request); goto out; } @@ -1041,7 +1043,7 @@ int posix_lock_file_wait(struct file *fi might_sleep (); for (;;) { error = posix_lock_file(filp, fl, NULL); - if ((error != -EAGAIN) || !(fl->fl_flags & FL_SLEEP)) + if (error != FILE_LOCK_DEFERRED) break; error = wait_event_interruptible(fl->fl_wait, !fl->fl_next); if (!error) @@ -1113,9 +1115,7 @@ int locks_mandatory_area(int read_write, for (;;) { error = __posix_lock_file(inode, &fl, NULL); - if (error != -EAGAIN) - break; - if (!(fl.fl_flags & FL_SLEEP)) + if (error != FILE_LOCK_DEFERRED) break; error = wait_event_interruptible(fl.fl_wait, !fl.fl_next); if (!error) { @@ -1537,7 +1537,7 @@ int flock_lock_file_wait(struct file *fi might_sleep(); for (;;) { error = flock_lock_file(filp, fl); - if ((error != -EAGAIN) || !(fl->fl_flags & FL_SLEEP)) + if (error != FILE_LOCK_DEFERRED) break; error = wait_event_interruptible(fl->fl_wait, !fl->fl_next); if (!error) @@ -1722,17 +1722,17 @@ out: * fl_grant is set. Callers expecting ->lock() to return asynchronously * will only use F_SETLK, not F_SETLKW; they will set FL_SLEEP if (and only if) * the request is for a blocking lock. When ->lock() does return asynchronously, - * it must return -EINPROGRESS, and call ->fl_grant() when the lock + * it must return FILE_LOCK_DEFERRED, and call ->fl_grant() when the lock * request completes. * If the request is for non-blocking lock the file system should return - * -EINPROGRESS then try to get the lock and call the callback routine with - * the result. If the request timed out the callback routine will return a + * FILE_LOCK_DEFERRED then try to get the lock and call the callback routine + * with the result. If the request timed out the callback routine will return a * nonzero return code and the file system should release the lock. The file * system is also responsible to keep a corresponding posix lock when it * grants a lock so the VFS can find out which locks are locally held and do * the correct lock cleanup when required. * The underlying filesystem must not drop the kernel lock or call - * ->fl_grant() before returning to the caller with a -EINPROGRESS + * ->fl_grant() before returning to the caller with a FILE_LOCK_DEFERRED * return code. */ int vfs_lock_file(struct file *filp, unsigned int cmd, struct file_lock *fl, struct file_lock *conf) @@ -1744,6 +1744,30 @@ int vfs_lock_file(struct file *filp, uns } EXPORT_SYMBOL_GPL(vfs_lock_file); +static int do_lock_file_wait(struct file *filp, unsigned int cmd, + struct file_lock *fl) +{ + int error; + + error = security_file_lock(filp, fl->fl_type); + if (error) + return error; + + for (;;) { + error = vfs_lock_file(filp, cmd, fl, NULL); + if (error != FILE_LOCK_DEFERRED) + break; + error = wait_event_interruptible(fl->fl_wait, !fl->fl_next); + if (!error) + continue; + + locks_delete_block(fl); + break; + } + + return error; +} + /* Apply the lock described by l to an open file descriptor. * This implements both the F_SETLK and F_SETLKW commands of fcntl(). */ @@ -1801,26 +1825,7 @@ again: goto out; } - error = security_file_lock(filp, file_lock->fl_type); - if (error) - goto out; - - if (filp->f_op && filp->f_op->lock != NULL) - error = filp->f_op->lock(filp, cmd, file_lock); - else { - for (;;) { - error = posix_lock_file(filp, file_lock, NULL); - if (error != -EAGAIN || cmd == F_SETLK) - break; - error = wait_event_interruptible(file_lock->fl_wait, - !file_lock->fl_next); - if (!error) - continue; - - locks_delete_block(file_lock); - break; - } - } + error = do_lock_file_wait(filp, cmd, file_lock); /* * Attempt to detect a close/fcntl race and recover by @@ -1938,26 +1943,7 @@ again: goto out; } - error = security_file_lock(filp, file_lock->fl_type); - if (error) - goto out; - - if (filp->f_op && filp->f_op->lock != NULL) - error = filp->f_op->lock(filp, cmd, file_lock); - else { - for (;;) { - error = posix_lock_file(filp, file_lock, NULL); - if (error != -EAGAIN || cmd == F_SETLK64) - break; - error = wait_event_interruptible(file_lock->fl_wait, - !file_lock->fl_next); - if (!error) - continue; - - locks_delete_block(file_lock); - break; - } - } + error = do_lock_file_wait(filp, cmd, file_lock); /* * Attempt to detect a close/fcntl race and recover by Index: linux-2.6.26-rc5/include/linux/fs.h =================================================================== --- linux-2.6.26-rc5.orig/include/linux/fs.h 2008-06-11 10:06:02.000000000 +0200 +++ linux-2.6.26-rc5/include/linux/fs.h 2008-06-11 10:14:30.000000000 +0200 @@ -890,6 +890,12 @@ static inline int file_check_writeable(s #define FL_SLEEP 128 /* A blocking lock */ /* + * Special return value from posix_lock_file() and vfs_lock_file() for + * asynchronous locking. + */ +#define FILE_LOCK_DEFERRED 1 + +/* * The POSIX file lock owner is determined by * the "struct files_struct" in the thread group * (or NULL for no owner - BSD locks). Index: linux-2.6.26-rc5/fs/fuse/dir.c =================================================================== --- linux-2.6.26-rc5.orig/fs/fuse/dir.c 2008-06-11 09:59:55.000000000 +0200 +++ linux-2.6.26-rc5/fs/fuse/dir.c 2008-06-11 10:14:30.000000000 +0200 @@ -97,7 +97,7 @@ void fuse_invalidate_attr(struct inode * * timeout is unknown (unlink, rmdir, rename and in some cases * lookup) */ -static void fuse_invalidate_entry_cache(struct dentry *entry) +void fuse_invalidate_entry_cache(struct dentry *entry) { fuse_dentry_settime(entry, 0); } @@ -112,18 +112,16 @@ static void fuse_invalidate_entry(struct fuse_invalidate_entry_cache(entry); } -static void fuse_lookup_init(struct fuse_req *req, struct inode *dir, - struct dentry *entry, +static void fuse_lookup_init(struct fuse_conn *fc, struct fuse_req *req, + u64 nodeid, struct qstr *name, struct fuse_entry_out *outarg) { - struct fuse_conn *fc = get_fuse_conn(dir); - memset(outarg, 0, sizeof(struct fuse_entry_out)); req->in.h.opcode = FUSE_LOOKUP; - req->in.h.nodeid = get_node_id(dir); + req->in.h.nodeid = nodeid; req->in.numargs = 1; - req->in.args[0].size = entry->d_name.len + 1; - req->in.args[0].value = entry->d_name.name; + req->in.args[0].size = name->len + 1; + req->in.args[0].value = name->name; req->out.numargs = 1; if (fc->minor < 9) req->out.args[0].size = FUSE_COMPAT_ENTRY_OUT_SIZE; @@ -189,7 +187,8 @@ static int fuse_dentry_revalidate(struct attr_version = fuse_get_attr_version(fc); parent = dget_parent(entry); - fuse_lookup_init(req, parent->d_inode, entry, &outarg); + fuse_lookup_init(fc, req, get_node_id(parent->d_inode), + &entry->d_name, &outarg); request_send(fc, req); dput(parent); err = req->out.h.error; @@ -225,7 +224,7 @@ static int invalid_nodeid(u64 nodeid) return !nodeid || nodeid == FUSE_ROOT_ID; } -static struct dentry_operations fuse_dentry_operations = { +struct dentry_operations fuse_dentry_operations = { .d_revalidate = fuse_dentry_revalidate, }; @@ -239,85 +238,127 @@ int fuse_valid_type(int m) * Add a directory inode to a dentry, ensuring that no other dentry * refers to this inode. Called with fc->inst_mutex. */ -static int fuse_d_add_directory(struct dentry *entry, struct inode *inode) +static struct dentry *fuse_d_add_directory(struct dentry *entry, + struct inode *inode) { struct dentry *alias = d_find_alias(inode); - if (alias) { + if (alias && !(alias->d_flags & DCACHE_DISCONNECTED)) { /* This tries to shrink the subtree below alias */ fuse_invalidate_entry(alias); dput(alias); if (!list_empty(&inode->i_dentry)) - return -EBUSY; + return ERR_PTR(-EBUSY); + } else { + dput(alias); } - d_add(entry, inode); - return 0; + return d_splice_alias(inode, entry); } -static struct dentry *fuse_lookup(struct inode *dir, struct dentry *entry, - struct nameidata *nd) +int fuse_lookup_name(struct super_block *sb, u64 nodeid, struct qstr *name, + struct fuse_entry_out *outarg, struct inode **inode) { - int err; - struct fuse_entry_out outarg; - struct inode *inode = NULL; - struct fuse_conn *fc = get_fuse_conn(dir); + struct fuse_conn *fc = get_fuse_conn_super(sb); struct fuse_req *req; struct fuse_req *forget_req; u64 attr_version; + int err; - if (entry->d_name.len > FUSE_NAME_MAX) - return ERR_PTR(-ENAMETOOLONG); + *inode = NULL; + err = -ENAMETOOLONG; + if (name->len > FUSE_NAME_MAX) + goto out; req = fuse_get_req(fc); + err = PTR_ERR(req); if (IS_ERR(req)) - return ERR_CAST(req); + goto out; forget_req = fuse_get_req(fc); + err = PTR_ERR(forget_req); if (IS_ERR(forget_req)) { fuse_put_request(fc, req); - return ERR_CAST(forget_req); + goto out; } attr_version = fuse_get_attr_version(fc); - fuse_lookup_init(req, dir, entry, &outarg); + fuse_lookup_init(fc, req, nodeid, name, outarg); request_send(fc, req); err = req->out.h.error; fuse_put_request(fc, req); /* Zero nodeid is same as -ENOENT, but with valid timeout */ - if (!err && outarg.nodeid && - (invalid_nodeid(outarg.nodeid) || - !fuse_valid_type(outarg.attr.mode))) - err = -EIO; - if (!err && outarg.nodeid) { - inode = fuse_iget(dir->i_sb, outarg.nodeid, outarg.generation, - &outarg.attr, entry_attr_timeout(&outarg), - attr_version); - if (!inode) { - fuse_send_forget(fc, forget_req, outarg.nodeid, 1); - return ERR_PTR(-ENOMEM); - } + if (err || !outarg->nodeid) + goto out_put_forget; + + err = -EIO; + if (!outarg->nodeid) + goto out_put_forget; + if (!fuse_valid_type(outarg->attr.mode)) + goto out_put_forget; + + *inode = fuse_iget(sb, outarg->nodeid, outarg->generation, + &outarg->attr, entry_attr_timeout(outarg), + attr_version); + err = -ENOMEM; + if (!*inode) { + fuse_send_forget(fc, forget_req, outarg->nodeid, 1); + goto out; } + err = 0; + + out_put_forget: fuse_put_request(fc, forget_req); - if (err && err != -ENOENT) - return ERR_PTR(err); + out: + return err; +} + +static struct dentry *fuse_lookup(struct inode *dir, struct dentry *entry, + struct nameidata *nd) +{ + int err; + struct fuse_entry_out outarg; + struct inode *inode; + struct dentry *newent; + struct fuse_conn *fc = get_fuse_conn(dir); + bool outarg_valid = true; + + err = fuse_lookup_name(dir->i_sb, get_node_id(dir), &entry->d_name, + &outarg, &inode); + if (err == -ENOENT) { + outarg_valid = false; + err = 0; + } + if (err) + goto out_err; + + err = -EIO; + if (inode && get_node_id(inode) == FUSE_ROOT_ID) + goto out_iput; if (inode && S_ISDIR(inode->i_mode)) { mutex_lock(&fc->inst_mutex); - err = fuse_d_add_directory(entry, inode); + newent = fuse_d_add_directory(entry, inode); mutex_unlock(&fc->inst_mutex); - if (err) { - iput(inode); - return ERR_PTR(err); - } - } else - d_add(entry, inode); + err = PTR_ERR(newent); + if (IS_ERR(newent)) + goto out_iput; + } else { + newent = d_splice_alias(inode, entry); + } + entry = newent ? newent : entry; entry->d_op = &fuse_dentry_operations; - if (!err) + if (outarg_valid) fuse_change_entry_timeout(entry, &outarg); else fuse_invalidate_entry_cache(entry); - return NULL; + + return newent; + + out_iput: + iput(inode); + out_err: + return ERR_PTR(err); } /* Index: linux-2.6.26-rc5/fs/fuse/fuse_i.h =================================================================== --- linux-2.6.26-rc5.orig/fs/fuse/fuse_i.h 2008-06-11 09:59:55.000000000 +0200 +++ linux-2.6.26-rc5/fs/fuse/fuse_i.h 2008-06-11 10:14:30.000000000 +0200 @@ -363,6 +363,9 @@ struct fuse_conn { /** Do not send separate SETATTR request before open(O_TRUNC) */ unsigned atomic_o_trunc : 1; + /** Filesystem supports NFS exporting. Only set in INIT */ + unsigned export_support : 1; + /* * The following bitfields are only for optimization purposes * and hence races in setting them will not cause malfunction @@ -464,6 +467,8 @@ static inline u64 get_node_id(struct ino /** Device operations */ extern const struct file_operations fuse_dev_operations; +extern struct dentry_operations fuse_dentry_operations; + /** * Get a filled in inode */ @@ -471,6 +476,9 @@ struct inode *fuse_iget(struct super_blo int generation, struct fuse_attr *attr, u64 attr_valid, u64 attr_version); +int fuse_lookup_name(struct super_block *sb, u64 nodeid, struct qstr *name, + struct fuse_entry_out *outarg, struct inode **inode); + /** * Send FORGET command */ @@ -604,6 +612,8 @@ void fuse_abort_conn(struct fuse_conn *f */ void fuse_invalidate_attr(struct inode *inode); +void fuse_invalidate_entry_cache(struct dentry *entry); + /** * Acquire reference to fuse_conn */ Index: linux-2.6.26-rc5/fs/fuse/inode.c =================================================================== --- linux-2.6.26-rc5.orig/fs/fuse/inode.c 2008-06-11 10:06:02.000000000 +0200 +++ linux-2.6.26-rc5/fs/fuse/inode.c 2008-06-11 10:14:30.000000000 +0200 @@ -18,6 +18,7 @@ #include <linux/statfs.h> #include <linux/random.h> #include <linux/sched.h> +#include <linux/exportfs.h> MODULE_AUTHOR("Miklos Szeredi <mi...@sz...>"); MODULE_DESCRIPTION("Filesystem in Userspace"); @@ -569,6 +570,174 @@ static struct inode *get_root_inode(stru return fuse_iget(sb, 1, 0, &attr, 0, 0); } +struct fuse_inode_handle +{ + u64 nodeid; + u32 generation; +}; + +static struct dentry *fuse_get_dentry(struct super_block *sb, + struct fuse_inode_handle *handle) +{ + struct fuse_conn *fc = get_fuse_conn_super(sb); + struct inode *inode; + struct dentry *entry; + int err = -ESTALE; + + if (handle->nodeid == 0) + goto out_err; + + inode = ilookup5(sb, handle->nodeid, fuse_inode_eq, &handle->nodeid); + if (!inode) { + struct fuse_entry_out outarg; + struct qstr name; + + if (!fc->export_support) + goto out_err; + + name.len = 1; + name.name = "."; + err = fuse_lookup_name(sb, handle->nodeid, &name, &outarg, + &inode); + if (err && err != -ENOENT) + goto out_err; + if (err || !inode) { + err = -ESTALE; + goto out_err; + } + err = -EIO; + if (get_node_id(inode) != handle->nodeid) + goto out_iput; + } + err = -ESTALE; + if (inode->i_generation != handle->generation) + goto out_iput; + + entry = d_alloc_anon(inode); + err = -ENOMEM; + if (!entry) + goto out_iput; + + if (get_node_id(inode) != FUSE_ROOT_ID) { + entry->d_op = &fuse_dentry_operations; + fuse_invalidate_entry_cache(entry); + } + + return entry; + + out_iput: + iput(inode); + out_err: + return ERR_PTR(err); +} + +static int fuse_encode_fh(struct dentry *dentry, u32 *fh, int *max_len, + int connectable) +{ + struct inode *inode = dentry->d_inode; + bool encode_parent = connectable && !S_ISDIR(inode->i_mode); + int len = encode_parent ? 6 : 3; + u64 nodeid; + u32 generation; + + if (*max_len < len) + return 255; + + nodeid = get_fuse_inode(inode)->nodeid; + generation = inode->i_generation; + + fh[0] = (u32)(nodeid >> 32); + fh[1] = (u32)(nodeid & 0xffffffff); + fh[2] = generation; + + if (encode_parent) { + struct inode *parent; + + spin_lock(&dentry->d_lock); + parent = dentry->d_parent->d_inode; + nodeid = get_fuse_inode(parent)->nodeid; + generation = parent->i_generation; + spin_unlock(&dentry->d_lock); + + fh[3] = (u32)(nodeid >> 32); + fh[4] = (u32)(nodeid & 0xffffffff); + fh[5] = generation; + } + + *max_len = len; + return encode_parent ? 0x82 : 0x81; +} + +static struct dentry *fuse_fh_to_dentry(struct super_block *sb, + struct fid *fid, int fh_len, int fh_type) +{ + struct fuse_inode_handle handle; + + if ((fh_type != 0x81 && fh_type != 0x82) || fh_len < 3) + return NULL; + + handle.nodeid = (u64) fid->raw[0] << 32; + handle.nodeid |= (u64) fid->raw[1]; + handle.generation = fid->raw[2]; + return fuse_get_dentry(sb, &handle); +} + +static struct dentry *fuse_fh_to_parent(struct super_block *sb, + struct fid *fid, int fh_len, int fh_type) +{ + struct fuse_inode_handle parent; + + if (fh_type != 0x82 || fh_len < 6) + return NULL; + + parent.nodeid = (u64) fid->raw[3] << 32; + parent.nodeid |= (u64) fid->raw[4]; + parent.generation = fid->raw[5]; + return fuse_get_dentry(sb, &parent); +} + +static struct dentry *fuse_get_parent(struct dentry *child) +{ + struct inode *child_inode = child->d_inode; + struct fuse_conn *fc = get_fuse_conn(child_inode); + struct inode *inode; + struct dentry *parent; + struct fuse_entry_out outarg; + struct qstr name; + int err; + + if (!fc->export_support) + return ERR_PTR(-ESTALE); + + name.len = 2; + name.name = ".."; + err = fuse_lookup_name(child_inode->i_sb, get_node_id(child_inode), + &name, &outarg, &inode); + if (err && err != -ENOENT) + return ERR_PTR(err); + if (err || !inode) + return ERR_PTR(-ESTALE); + + parent = d_alloc_anon(inode); + if (!parent) { + iput(inode); + return ERR_PTR(-ENOMEM); + } + if (get_node_id(inode) != FUSE_ROOT_ID) { + parent->d_op = &fuse_dentry_operations; + fuse_invalidate_entry_cache(parent); + } + + return parent; +} + +static const struct export_operations fuse_export_operations = { + .fh_to_dentry = fuse_fh_to_dentry, + .fh_to_parent = fuse_fh_to_parent, + .encode_fh = fuse_encode_fh, + .get_parent = fuse_get_parent, +}; + static const struct super_operations fuse_super_operations = { .alloc_inode = fuse_alloc_inode, .destroy_inode = fuse_destroy_inode, @@ -598,6 +767,11 @@ static void process_init_reply(struct fu fc->no_lock = 1; if (arg->flags & FUSE_ATOMIC_O_TRUNC) fc->atomic_o_trunc = 1; + if (arg->minor >= 9) { + /* LOOKUP has dependency on proto version */ + if (arg->flags & FUSE_EXPORT_SUPPORT) + fc->export_support = 1; + } if (arg->flags & FUSE_BIG_WRITES) fc->big_writes = 1; } else { @@ -624,7 +798,7 @@ static void fuse_send_init(struct fuse_c arg->minor = FUSE_KERNEL_MINOR_VERSION; arg->max_readahead = fc->bdi.ra_pages * PAGE_CACHE_SIZE; arg->flags |= FUSE_ASYNC_READ | FUSE_POSIX_LOCKS | FUSE_ATOMIC_O_TRUNC | - FUSE_BIG_WRITES; + FUSE_EXPORT_SUPPORT | FUSE_BIG_WRITES; req->in.h.opcode = FUSE_INIT; req->in.numargs = 1; req->in.args[0].size = sizeof(*arg); @@ -673,6 +847,7 @@ static int fuse_fill_super(struct super_ sb->s_magic = FUSE_SUPER_MAGIC; sb->s_op = &fuse_super_operations; sb->s_maxbytes = MAX_LFS_FILESIZE; + sb->s_export_op = &fuse_export_operations; file = fget(d.fd); if (!file) Index: linux-2.6.26-rc5/include/linux/fuse.h =================================================================== --- linux-2.6.26-rc5.orig/include/linux/fuse.h 2008-06-11 09:59:57.000000000 +0200 +++ linux-2.6.26-rc5/include/linux/fuse.h 2008-06-11 10:14:30.000000000 +0200 @@ -104,11 +104,14 @@ struct fuse_file_lock { /** * INIT request/reply flags + * + * FUSE_EXPORT_SUPPORT: filesystem handles lookups of "." and ".." */ #define FUSE_ASYNC_READ (1 << 0) #define FUSE_POSIX_LOCKS (1 << 1) #define FUSE_FILE_OPS (1 << 2) #define FUSE_ATOMIC_O_TRUNC (1 << 3) +#define FUSE_EXPORT_SUPPORT (1 << 4) #define FUSE_BIG_WRITES (1 << 5) /** Index: linux-2.6.26-rc5/fs/fuse/file.c =================================================================== --- linux-2.6.26-rc5.orig/fs/fuse/file.c 2008-06-11 09:59:55.000000000 +0200 +++ linux-2.6.26-rc5/fs/fuse/file.c 2008-06-11 10:14:30.000000000 +0200 @@ -1341,6 +1341,11 @@ static int fuse_setlk(struct file *file, pid_t pid = fl->fl_type != F_UNLCK ? current->tgid : 0; int err; + if (fl->fl_lmops && fl->fl_lmops->fl_grant) { + /* NLM needs asynchronous locks, which we don't support yet */ + return -ENOLCK; + } + /* Unlock on close is handled by the flush method */ if (fl->fl_flags & FL_CLOSE) return 0; @@ -1365,7 +1370,9 @@ static int fuse_file_lock(struct file *f struct fuse_conn *fc = get_fuse_conn(inode); int err; - if (cmd == F_GETLK) { + if (cmd == F_CANCELLK) { + err = 0; + } else if (cmd == F_GETLK) { if (fc->no_lock) { posix_test_lock(file, fl); err = 0; @@ -1373,7 +1380,7 @@ static int fuse_file_lock(struct file *f err = fuse_getlk(file, fl); } else { if (fc->no_lock) - err = posix_lock_file_wait(file, fl); + err = posix_lock_file(file, fl, NULL); else err = fuse_setlk(file, fl, 0); } |