From: Martin M. <tb...@cy...> - 2006-12-20 17:57:35
Attachments:
strace
|
We at Debian received the following bug report saying that FUSE is not working on ARM. I've verified this on two ARM platforms (IXP4xx and IOP32x) and also checked that it's working fine on MIPS. The problem seems that it hangs in stat64. The report initially described that encfs hangs, but the same happens with sshfs: * Jon Dowland <jo...@re...> [2006-12-16 20:12]: > Following some advice on IRC: > > 10:58 < suihkulokki> Jon: can you try some other fuse based fs? like sshfs? > 10:59 < suihkulokki> Jon: also one possibility is that you are out of /dev/random pool > 10:59 < suihkulokki> Jon: so try stracing the fuse process as well > > I could successfully read bytes from /dev/random whilst an > ls into an encfs mount was hanging, so I don't think it's > that. > > I installed sshfs and tried that and an ls for that also > hung. I can confirm that ls with sshfs also hangs for me. The output of strace is: open("/proc/mounts", O_RDONLY|O_LARGEFILE) = 3 fstat64(3, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x40016000 read(3, "rootfs / rootfs rw 0 0\nnone /sys"..., 1024) = 525 read(3, "", 1024) = 0 close(3) = 0 munmap(0x40016000, 4096) = 0 ioctl(1, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig icanon echo ...}) = 0 ioctl(1, TIOCGWINSZ, {ws_row=36, ws_col=102, ws_xpixel=1020, ws_ypixel=720}) = 0 stat64("/mnt", I've also attached Jon's strace output from encfs. Any idea what's going on here? -- Martin Michlmayr http://www.cyrius.com/ |
From: Miklos S. <mi...@sz...> - 2006-12-20 18:38:23
|
> We at Debian received the following bug report saying that FUSE is not > working on ARM. I've verified this on two ARM platforms (IXP4xx and > IOP32x) and also checked that it's working fine on MIPS. The problem > seems that it hangs in stat64. Can you please try the out-of-tree kernel module from the fuse-2.6.x package (use 'configure --enable-kernel-module). That contains a workaround for a bug in the ARM architecture code. If this fixes the problem for you, then I would advise you to remind the ARM maintainer (Russel King) about this issue. Thanks, Miklos |
From: Martin M. <tb...@cy...> - 2006-12-21 08:25:05
|
* Miklos Szeredi <mi...@sz...> [2006-12-20 19:37]: > > We at Debian received the following bug report saying that FUSE is not > > working on ARM. I've verified this on two ARM platforms (IXP4xx and > > IOP32x) and also checked that it's working fine on MIPS. The problem > > seems that it hangs in stat64. > > Can you please try the out-of-tree kernel module from the fuse-2.6.x > package (use 'configure --enable-kernel-module). That contains a > workaround for a bug in the ARM architecture code. > > If this fixes the problem for you, then I would advise you to remind > the ARM maintainer (Russel King) about this issue. The bug reporter confirms that it works with the modules from the fuse-2.6.1 package. Russell, have you addressed this problem recently or is this still an open issue? I've only tried 2.6.17 and 2.6.18 so far, nothing newer. If there's a fix available, I can put it in Debian's 2.6.18 package that will ship with our next release. -- Martin Michlmayr http://www.cyrius.com/ |
From: Russell K. <rm...@ar...> - 2006-12-21 09:43:44
|
On Thu, Dec 21, 2006 at 09:24:56AM +0100, Martin Michlmayr wrote: > * Miklos Szeredi <mi...@sz...> [2006-12-20 19:37]: > > > We at Debian received the following bug report saying that FUSE is not > > > working on ARM. I've verified this on two ARM platforms (IXP4xx and > > > IOP32x) and also checked that it's working fine on MIPS. The problem > > > seems that it hangs in stat64. > > > > Can you please try the out-of-tree kernel module from the fuse-2.6.x > > package (use 'configure --enable-kernel-module). That contains a > > workaround for a bug in the ARM architecture code. > > > > If this fixes the problem for you, then I would advise you to remind > > the ARM maintainer (Russel King) about this issue. > > The bug reporter confirms that it works with the modules from the > fuse-2.6.1 package. Russell, have you addressed this problem recently > or is this still an open issue? I've only tried 2.6.17 and 2.6.18 so > far, nothing newer. If there's a fix available, I can put it in > Debian's 2.6.18 package that will ship with our next release. This is the first I've heard of a problem. What _exactly_ is the problem and can you provide a test case or instructions to reproduce it? -- Russell King |
From: Martin M. <tb...@cy...> - 2006-12-21 09:53:06
|
* Russell King <rm...@ar...> [2006-12-21 09:43]: > > > That contains a workaround for a bug in the ARM architecture code. > > > > This is the first I've heard of a problem. > > What _exactly_ is the problem and can you provide a test case or > instructions to reproduce it? stat64 hangs when you try to access any filesystem mounted with FUSE. strace shows: munmap(0x40016000, 4096) = 0 ioctl(1, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig icanon echo ...}) = 0 ioctl(1, TIOCGWINSZ, {ws_row=36, ws_col=102, ws_xpixel=1020, ws_ypixel=720}) = 0 stat64("/mnt", and there it hangs. I guess Miklos can provide a better description given that he put a workaround for ARM into his FUSE code. -- Martin Michlmayr http://www.cyrius.com/ |
From: Russell K. <rm...@ar...> - 2006-12-21 12:54:42
|
On Thu, Dec 21, 2006 at 10:52:53AM +0100, Martin Michlmayr wrote: > * Russell King <rm...@ar...> [2006-12-21 09:43]: > > > > That contains a workaround for a bug in the ARM architecture code. > > > > > > This is the first I've heard of a problem. > > > > What _exactly_ is the problem and can you provide a test case or > > instructions to reproduce it? > > stat64 hangs when you try to access any filesystem mounted with FUSE. > strace shows: > > munmap(0x40016000, 4096) = 0 > ioctl(1, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig icanon echo ...}) = 0 > ioctl(1, TIOCGWINSZ, {ws_row=36, ws_col=102, ws_xpixel=1020, ws_ypixel=720}) = 0 > stat64("/mnt", > > and there it hangs. > > I guess Miklos can provide a better description given that he put a > workaround for ARM into his FUSE code. Hmm. It'll be a few days (or more likely the new year provided I remember) before I can solve this. -- Russell King |
From: Miklos S. <mi...@sz...> - 2006-12-21 10:07:03
|
> On Thu, Dec 21, 2006 at 09:24:56AM +0100, Martin Michlmayr wrote: > > * Miklos Szeredi <mi...@sz...> [2006-12-20 19:37]: > > > > We at Debian received the following bug report saying that FUSE is not > > > > working on ARM. I've verified this on two ARM platforms (IXP4xx and > > > > IOP32x) and also checked that it's working fine on MIPS. The problem > > > > seems that it hangs in stat64. > > > > > > Can you please try the out-of-tree kernel module from the fuse-2.6.x > > > package (use 'configure --enable-kernel-module). That contains a > > > workaround for a bug in the ARM architecture code. > > > > > > If this fixes the problem for you, then I would advise you to remind > > > the ARM maintainer (Russel King) about this issue. > > > > The bug reporter confirms that it works with the modules from the > > fuse-2.6.1 package. Russell, have you addressed this problem recently > > or is this still an open issue? I've only tried 2.6.17 and 2.6.18 so > > far, nothing newer. If there's a fix available, I can put it in > > Debian's 2.6.18 package that will ship with our next release. > > This is the first I've heard of a problem. > > What _exactly_ is the problem and can you provide a test case or > instructions to reproduce it? This is the dcache aliasing issue in get_user_pages() for anonymous pages: http://lkml.org/lkml/2006/10/7/80 The reason why this only shows with FUSE is that ptrace() does it's own redundant cache flushing, and other users of get_user_pages() like SCSI and NFS direct-IO probably get less exposure on ARM than FUSE. To reproduce, build a kernel with CONFIG_FUSE_FS, build the fuse package (http://downloads.sourceforge.net/fuse/fuse-2.6.1.tar.gz) and run one of the example filesystems. Miklos |
From: Russell K. <rm...@ar...> - 2006-12-21 14:40:02
|
On Thu, Dec 21, 2006 at 11:05:49AM +0100, Miklos Szeredi wrote: > > On Thu, Dec 21, 2006 at 09:24:56AM +0100, Martin Michlmayr wrote: > > > * Miklos Szeredi <mi...@sz...> [2006-12-20 19:37]: > > > > > We at Debian received the following bug report saying that FUSE is not > > > > > working on ARM. I've verified this on two ARM platforms (IXP4xx and > > > > > IOP32x) and also checked that it's working fine on MIPS. The problem > > > > > seems that it hangs in stat64. > > > > > > > > Can you please try the out-of-tree kernel module from the fuse-2.6.x > > > > package (use 'configure --enable-kernel-module). That contains a > > > > workaround for a bug in the ARM architecture code. > > > > > > > > If this fixes the problem for you, then I would advise you to remind > > > > the ARM maintainer (Russel King) about this issue. > > > > > > The bug reporter confirms that it works with the modules from the > > > fuse-2.6.1 package. Russell, have you addressed this problem recently > > > or is this still an open issue? I've only tried 2.6.17 and 2.6.18 so > > > far, nothing newer. If there's a fix available, I can put it in > > > Debian's 2.6.18 package that will ship with our next release. > > > > This is the first I've heard of a problem. > > > > What _exactly_ is the problem and can you provide a test case or > > instructions to reproduce it? > > This is the dcache aliasing issue in get_user_pages() for anonymous > pages: > > http://lkml.org/lkml/2006/10/7/80 > > The reason why this only shows with FUSE is that ptrace() does it's > own redundant cache flushing, and other users of get_user_pages() like > SCSI and NFS direct-IO probably get less exposure on ARM than FUSE. > > To reproduce, build a kernel with CONFIG_FUSE_FS, build the fuse > package (http://downloads.sourceforge.net/fuse/fuse-2.6.1.tar.gz) and > run one of the example filesystems. This may be a silly question, but why is fuse attempting to use the kernel mapping to read/write the current processes userspace? Most normal device drivers use the user accessor functions in asm/uaccess.h and this would entirely avoid the cache aliasing issues. To get fuse to work on ARM, we would need to flush the dcache (at I hasten to add 32 byte intervals) over the area you wish to read, both for the kernel _and_ userspace mappings of that page. So, to read an entire page this way, what you're looking at is: - 128 flush instructions to flush the kernel mapping. - 128 flush instructions to flush the user mapping. - memcpy overhead. whereas to use the standard uaccess functions, you're looking at just the memcpy overhead. Therefore, I suggest that James' flush_anon_page() stuff just papers over what is actually a fuse bug - it should not be using get_user_pages() to access the current processes memory space. -- Russell King |
From: Russell K. <rm...@ar...> - 2006-12-21 15:29:28
|
On Thu, Dec 21, 2006 at 11:05:49AM +0100, Miklos Szeredi wrote: > This is the dcache aliasing issue in get_user_pages() for anonymous > pages: > > http://lkml.org/lkml/2006/10/7/80 Okay, I've written up my thoughts on this to the kernel community. In summary, I don't see how FUSE can work reliably on ARM in its current state, especially given the complexities of preemption etc. I believe that using get_user_pages() to access the current processes VM is always going to be racy and unreliable no matter how much cache flushing you apply to the kernel. Please direct further discussion on this topic to my mail on linux-kernel. Thanks. -- Russell King |
From: Dave H. <dhy...@gm...> - 2006-12-22 15:26:20
|
Posting to list... > Please direct further discussion on this topic to my mail on > linux-kernel. Thanks. Which can be found here: http://lkml.org/lkml/2006/12/21/157 Excellent explanation. -- Dave Hylands Vancouver, BC, Canada http://www.DaveHylands.com/ |