|
From: Mike S. <ma...@gm...> - 2011-07-18 11:53:57
|
Hello, I'm trying to run valgrind on my FUSE filesystem, and I'm running into a similar problem as this thread: http://sourceforge.net/mailarchive/message.php?msg_id=26871509 I've hit a deadlock waiting for chdir(), close(), fchdir(), and stat64() (found by running valgrind with --trace-syscalls=yes). The patch below makes valgrind work for my application. I'm not too familiar with valgrind internals - is there a reason for assuming that syscalls won't block by default? Are there any other side-effects for a patch like this? Thanks, -Mike Index: coregrind/m_syswrap/syswrap-generic.c =================================================================== --- coregrind/m_syswrap/syswrap-generic.c (revision 11899) +++ coregrind/m_syswrap/syswrap-generic.c (working copy) @@ -2848,6 +2848,7 @@ PRE(sys_chdir) { + *flags |= SfMayBlock; PRINT("sys_chdir ( %#lx(%s) )", ARG1,(char*)ARG1); PRE_REG_READ1(long, "chdir", const char *, path); PRE_MEM_RASCIIZ( "chdir(path)", ARG1 ); @@ -2878,6 +2879,7 @@ PRE(sys_close) { + *flags |= SfMayBlock; PRINT("sys_close ( %ld )", ARG1); PRE_REG_READ1(long, "close", unsigned int, fd); @@ -2929,6 +2931,7 @@ PRE(sys_fchdir) { + *flags |= SfMayBlock; PRINT("sys_fchdir ( %ld )", ARG1); PRE_REG_READ1(long, "fchdir", unsigned int, fd); } Index: coregrind/m_syswrap/syswrap-x86-linux.c =================================================================== --- coregrind/m_syswrap/syswrap-x86-linux.c (revision 11899) +++ coregrind/m_syswrap/syswrap-x86-linux.c (working copy) @@ -1404,6 +1404,7 @@ PRE(sys_stat64) { + *flags |= SfMayBlock; PRINT("sys_stat64 ( %#lx(%s), %#lx )",ARG1,(char*)ARG1,ARG2); PRE_REG_READ2(long, "stat64", char *, file_name, struct stat64 *, buf); PRE_MEM_RASCIIZ( "stat64(file_name)", ARG1 ); |
|
From: John R. <jr...@bi...> - 2011-07-18 14:31:43
|
> Is there a reason for assuming that syscalls won't block by default? There is substantial overhead to allow for the possibility that a syscall might block. This wakes up valgrind's internal machinery for threads, which is not spry. chdir and fchdir are used infrequently enough that it might not matter too much; but stat64 and close are used quite often. -- |
|
From: Mike S. <ma...@gm...> - 2011-07-18 23:26:52
|
On Mon, Jul 18, 2011 at 10:32 AM, John Reiser <jr...@bi...> wrote: >> Is there a reason for assuming that syscalls won't block by default? > > There is substantial overhead to allow for the possibility that a syscall > might block. This wakes up valgrind's internal machinery for threads, > which is not spry. chdir and fchdir are used infrequently enough that > it might not matter too much; but stat64 and close are used quite often. Ok, makes sense. Tom Hughes suggested that I create a bug and attach my patch, so that's what I did. If it is going to be a significant performance hit, maybe it would be better to have an option that can turn this on for all syscalls? That way when debugging a fuse file-system, the syscalls won't deadlock, but it won't slow things down for the general user. Also I imagine there are other filesystem related syscalls that would have the same issue - I only patched the ones that were causing me trouble. Thanks for the help, -Mike |
|
From: John R. <jr...@bi...> - 2011-07-19 03:28:21
|
> If it is going to be a significant
> performance hit, maybe it would be better to have an option that can
> turn this on for all syscalls? That way when debugging a fuse
> file-system, the syscalls won't deadlock, but it won't slow things
> down for the general user.
On Linux the only syscalls that can block for a FUSE file-system
are ones that implement the kernel's VFS layer (Virtual File System),
which uses a struct of function pointers for each relevant syscall.
So, just find that initialized struct in the Linux kernel sources,
and there is the list of syscalls that can block. From linux/fs/fuse/file.c:
-----
static const struct file_operations fuse_file_operations = {
.llseek = fuse_file_llseek,
.read = do_sync_read,
.aio_read = fuse_file_aio_read,
.write = do_sync_write,
.aio_write = fuse_file_aio_write,
.mmap = fuse_file_mmap,
.open = fuse_open,
.flush = fuse_flush,
.release = fuse_release, ### last 'close'
.fsync = fuse_fsync,
.lock = fuse_file_lock,
.flock = fuse_file_flock,
.splice_read = generic_file_splice_read,
.unlocked_ioctl = fuse_file_ioctl,
.compat_ioctl = fuse_file_compat_ioctl,
.poll = fuse_file_poll,
};
static const struct file_operations fuse_direct_io_file_operations = {
.llseek = fuse_file_llseek,
.read = fuse_direct_read,
.write = fuse_direct_write,
.mmap = fuse_direct_mmap,
.open = fuse_open,
.flush = fuse_flush,
.release = fuse_release,
.fsync = fuse_fsync,
.lock = fuse_file_lock,
.flock = fuse_file_flock,
.unlocked_ioctl = fuse_file_ioctl,
.compat_ioctl = fuse_file_compat_ioctl,
.poll = fuse_file_poll,
/* no splice_read */
};
-----
Of course
*flags |= SfMayBlock;
could be unified to use a global variable that zero except when
set by a command-line switch for FUSE:
*flags |= SfMayBlock_var;
A complimentary option that also might work is
*flags |= ((1 < n_threads) ? SfMayBlock : 0);
if the problem never occurs unless there are multiple threads already.
--
|
|
From: Mike S. <ma...@gm...> - 2011-07-20 00:01:17
|
On Mon, Jul 18, 2011 at 11:29 PM, John Reiser <jr...@bi...> wrote:
> On Linux the only syscalls that can block for a FUSE file-system
> are ones that implement the kernel's VFS layer (Virtual File System),
> which uses a struct of function pointers for each relevant syscall.
> So, just find that initialized struct in the Linux kernel sources,
> and there is the list of syscalls that can block. From linux/fs/fuse/file.c:
> -----
> static const struct file_operations fuse_file_operations = {
> .llseek = fuse_file_llseek,
> .read = do_sync_read,
> .aio_read = fuse_file_aio_read,
> .write = do_sync_write,
> .aio_write = fuse_file_aio_write,
> .mmap = fuse_file_mmap,
> .open = fuse_open,
> .flush = fuse_flush,
> .release = fuse_release, ### last 'close'
> .fsync = fuse_fsync,
> .lock = fuse_file_lock,
> .flock = fuse_file_flock,
> .splice_read = generic_file_splice_read,
> .unlocked_ioctl = fuse_file_ioctl,
> .compat_ioctl = fuse_file_compat_ioctl,
> .poll = fuse_file_poll,
> };
>
> static const struct file_operations fuse_direct_io_file_operations = {
> .llseek = fuse_file_llseek,
> .read = fuse_direct_read,
> .write = fuse_direct_write,
> .mmap = fuse_direct_mmap,
> .open = fuse_open,
> .flush = fuse_flush,
> .release = fuse_release,
> .fsync = fuse_fsync,
> .lock = fuse_file_lock,
> .flock = fuse_file_flock,
> .unlocked_ioctl = fuse_file_ioctl,
> .compat_ioctl = fuse_file_compat_ioctl,
> .poll = fuse_file_poll,
> /* no splice_read */
> };
This is a good idea, but unfortunately there are some other syscalls
that use the file-system API, but don't correspond directly to
functions in this structure. For example, chdir() isn't handled by the
file-system API, but the kernel will use the file-system API to see if
permissions on the directory are sufficient before updating the
working directory.
I tried to use the patch attached to the bug that provides the
--fuse-compatible=yes option, but valgrind would still hang on
sys_chdir() and sys_stat64 in my tests. I added those in a new patch.
-Mike
|
|
From: Julian S. <js...@ac...> - 2011-07-19 07:18:50
|
> Ok, makes sense. Tom Hughes suggested that I create a bug and attach > my patch, so that's what I did. If it is going to be a significant > performance hit, maybe it would be better to have an option that can > turn this on for all syscalls? In the past we've simply added the SfMayBlock flags on an ad-hoc basis, on demand, and for the most part that's worked pretty well. It may be that you only need to add that flag to enough syscalls to break cycles of threads waiting for each other: once one link in the cycle is broken then it won't deadlock. I guess that's not robust in the worst case though, so all affected syscalls would need to be marked. Your patch on bug 287057 looks OK, although it would be good if you could fill in at least the amd64-linux equivalents too. J |
|
From: Mike S. <ma...@gm...> - 2011-07-20 00:04:22
|
On Tue, Jul 19, 2011 at 3:17 AM, Julian Seward <js...@ac...> wrote: > In the past we've simply added the SfMayBlock flags on an ad-hoc > basis, on demand, and for the most part that's worked pretty well. That's fine, I'm happy to supply more patches in the future if I come across any others. > > It may be that you only need to add that flag to enough syscalls > to break cycles of threads waiting for each other: once one link > in the cycle is broken then it won't deadlock. I guess that's not > robust in the worst case though, so all affected syscalls would > need to be marked. > > Your patch on bug 287057 looks OK, although it would be good if you > could fill in at least the amd64-linux equivalents too. The only one in syswrap-x86-linux.c I had patched was sys_stat64(), which doesn't appear in syswrap-amd64-linux.c (nor does sys_stat()). I'll see if I can try to run my tests on a 64-bit system to see if that hangs up in any different calls. -Mike |
|
From: Mike S. <ma...@gm...> - 2011-08-04 01:33:03
|
On Tue, Jul 19, 2011 at 8:04 PM, Mike Shal <ma...@gm...> wrote: > On Tue, Jul 19, 2011 at 3:17 AM, Julian Seward <js...@ac...> wrote: >> It may be that you only need to add that flag to enough syscalls >> to break cycles of threads waiting for each other: once one link >> in the cycle is broken then it won't deadlock. I guess that's not >> robust in the worst case though, so all affected syscalls would >> need to be marked. >> >> Your patch on bug 287057 looks OK, although it would be good if you >> could fill in at least the amd64-linux equivalents too. > > The only one in syswrap-x86-linux.c I had patched was sys_stat64(), > which doesn't appear in syswrap-amd64-linux.c (nor does sys_stat()). > I'll see if I can try to run my tests on a 64-bit system to see if > that hangs up in any different calls. I finally got a chance to run on an x86_64 host. I updated the patch to avoid blocking in sys_newfstatat(), since that is used on the 64-bit system. The patch also adds sys_fstatat64(), which was needed for a 32-bit host (I added an fstatat() call to my program, so this was a new hangup). Any chance this patch for the --fuse-compatible flag can get into svn? Or if that's not the best approach, let me know and I can try to provide an alternative patch. Thanks! -Mike |