From: Slawomir R. <sla...@ed...> - 2018-04-12 13:44:17
|
Hello, We have come across an apparent deadlock in the FUSE kernel module. It is easily reproducible using the LTP test suite where the "doio" binary, which performs I/O in the "rwtest02" test case, hangs intermittently in uninterruptible sleep during a write() syscall. It seems that there is a deadlock or missing wake-up when "write" requests are interleaved with "writepage" requests. The kernel stack of the locked process is as follows: [<0>] fuse_wait_on_page_writeback+0x74/0xb0 [<0>] fuse_perform_write+0x2a3/0x5d0 [<0>] fuse_file_write_iter+0x20e/0x2b0 [<0>] new_sync_write+0xe5/0x140 [<0>] __vfs_write+0x29/0x40 [<0>] vfs_write+0xb8/0x1b0 [<0>] SyS_write+0x55/0xc0 [<0>] do_syscall_64+0x73/0x130 [<0>] entry_SYSCALL_64_after_hwframe+0x3d/0xa2 The issue can only be reproduced with the "-o big_writes" option passed to FUSE. We have reproduced the issue on fusexmp_fh, on libfuse 2.9.4 and against each of kernels 4.4.0, 4.13.0, and 4.15.0. To reproduce, please follow the steps below: 1. Compile and install the Linux Test Project, e.g.: git clone https://github.com/linux-test-project/ltp.git cd ltp make autotools ./configure --prefix=/opt/ltp make -j 4 sudo make install 2. Prepare a test suite with the "rwtest02" test. In /opt/ltp/runtest, I created a file "doio_hang" with the following contents: stress__rwtest02 export LTPROOT; rwtest -N rwtest02 -c -q -i 60s -f buffered 10%25000:$TMPDIR/rw-buffered-$$ 3. Mount fusexmp_fh: mkdir /tmp/fuse ./fusexmp_fh -o allow_root -o big_writes /tmp/fuse 4. Run the test: mkdir /tmp/ltp-test-dir sudo /opt/ltp/runltp -f doio_hang -d /tmp/fuse/tmp/ltp-test-dir -o /tmp/ltp-output -l /tmp/ltp-log The tests sometimes passes for us, but it only takes a few tries before it hangs. The test should last around 70-90 seconds. If it lasts longer, examine "ps aux | grep doio" to find: root 26622 0.0 0.2 31716 26640 pts/8 D 13:44 0:00 /opt/ltp/testcases/bin/doio -N rwtest02 with a kernel stack trace (from /proc/.../stack) as pasted above. Note that we have also seen hangs with "-o debug" in the mount. Does anyone have an idea of what goes wrong there, and how it can be fixed? Best Regards, Sławek Rudnicki |