From: Sheng Y. <sh...@ya...> - 2016-08-09 04:26:24
|
Hi, I just got a weird behavior regarding using a loopback device to attach a file in fuse. Everything works fine if direct_io for fuse wasn't enabled. And as soon as direct_io enabled, loopback device will hang. The direct reason is in the direct io path(fuse_direct_io()), even the first read request by loopback device will hang because the page shared with userspace cannot be released, which result in this later: [ 480.068490] INFO: task loop0:18115 blocked for more than 120 seconds. [ 480.071210] Not tainted 4.4.13+ #10 [ 480.072970] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 480.076325] loop0 D ffff8800ba2ab918 13808 18115 2 0x00000000 [ 480.079676] ffff8800ba2ab918 ffff8800b89e8770 ffffffff8299a480 ffff88013fc95f18 [ 480.083386] ffff88013ab75200 ffff8800b89e8000 ffff8800ba2ac000 ffff88013fc95f00 [ 480.086541] 7fffffffffffffff ffff8800ba2abaa0 ffffffff81a730b0 ffff8800ba2ab930 [ 480.089602] Call Trace: [ 480.091154] [<ffffffff81a730b0>] ? bit_wait+0x60/0x60 [ 480.093370] [<ffffffff81a728e7>] schedule+0x37/0x80 [ 480.095621] [<ffffffff81a76ded>] schedule_timeout+0x25d/0x360 [ 480.097789] [<ffffffff8110f53a>] ? __delayacct_blkio_start+0x1a/0x30 [ 480.100015] [<ffffffff81049615>] ? kvm_clock_get_cycles+0x25/0x30 [ 480.102216] [<ffffffff810d34a0>] ? ktime_get+0x90/0x110 [ 480.104143] [<ffffffff8110f53a>] ? __delayacct_blkio_start+0x1a/0x30 [ 480.106319] [<ffffffff81a730b0>] ? bit_wait+0x60/0x60 [ 480.107979] [<ffffffff81a71c5f>] io_schedule_timeout+0x9f/0x110 [ 480.109892] [<ffffffff81a730c6>] bit_wait_io+0x16/0x60 [ 480.111605] [<ffffffff81a72eb9>] __wait_on_bit_lock+0x49/0xa0 [ 480.113469] [<ffffffff810b99bd>] ? vprintk_emit+0x2fd/0x560 [ 480.115312] [<ffffffff81143847>] __lock_page+0xa7/0xb0 [ 480.117010] [<ffffffff8109ef30>] ? autoremove_wake_function+0x30/0x30 [ 480.118974] [<ffffffff8114eb3c>] set_page_dirty_lock+0x4c/0x50 [ 480.120757] [<ffffffff812f6e47>] fuse_release_user_pages.isra.19+0x47/0x60 [ 480.122624] [<ffffffff812f95a1>] fuse_direct_io+0x281/0x5d0 [ 480.124357] [<ffffffff812f992f>] __fuse_direct_read+0x3f/0x60 [ 480.126074] [<ffffffff812f9985>] fuse_direct_read_iter+0x35/0x40 [ 480.127190] [<ffffffff811a95fd>] vfs_iter_read+0x5d/0x90 [ 480.128080] [<ffffffff815b49c9>] lo_read_simple.isra.25+0x99/0x1c0 [ 480.129057] [<ffffffff815b5c84>] loop_queue_work+0x654/0x6e0 [ 480.130006] [<ffffffff8107c111>] ? kthread_worker_fn+0x61/0x1a0 [ 480.130958] [<ffffffff8107c19a>] ? kthread_worker_fn+0xea/0x1a0 [ 480.131929] [<ffffffff8107c133>] kthread_worker_fn+0x83/0x1a0 [ 480.132875] [<ffffffff8107c0b0>] ? __init_kthread_worker+0x60/0x60 [ 480.133846] [<ffffffff8107c03a>] kthread+0xea/0x100 [ 480.134716] [<ffffffff8107bf50>] ? kthread_create_on_node+0x240/0x240 [ 480.135741] [<ffffffff81a7884f>] ret_from_fork+0x3f/0x70 [ 480.136627] [<ffffffff8107bf50>] ? kthread_create_on_node+0x240/0x240 [ 480.137619] no locks held by loop0/18115. Of course, everything works without loopback device. And loopback works with direct_io disabled fuse file. I am still working on this issue and trying to get to the bottom of it, but wondering if anyone has idea about what's happened? Cannot think of a reason why loopback device has anything to do with a page outside of it's reach. BTW, I am using a modified the version of libfuse hellofs example for testing, by add an file backing the read/write request. Can provide the source code if necessary. I've seen the same behavior using different backend as well, so I don't think it's specific to my fuse implementation. Thanks in advance. --Sheng |