From: WK <wk...@bn...> - 2011-06-26 21:28:09
|
One one of our MFS clusters has four clients mounted. Three of them running RHEL5/Cent5 never have issues. The fourth locks up at least once a week, with the below /var/log/messages (note the mount errors just go on forever until we reboot). It starts with khugepaged. Googling the issue indicates that many people are seeing this with other fuse projects all with recent kernels. In particular the ZFS project has a number of threads. Here is just one thread: http://zfs-fuse.net/issues/123 In that thread, aside from downgrading the distro there is a recommenation of limiting the memory used to 1GB or less using "zfs-fuse --stack-size=1024 -m 1024 --no-kstat-mount --disable-block-cache --disable-page-cache -v 1 --zfs-prefetch-disable". Is there a MFSmount equivalent for limiting memory or any suggestions/feedback regarding this issue. Sincerely, WK LOG FILE snippet Jun 26 13:41:25 ariel kernel: INFO: task khugepaged:52 blocked for more than 120 seconds. Jun 26 13:41:25 ariel kernel: "echo 0> /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jun 26 13:41:25 ariel kernel: khugepaged D ffff88012fc23080 0 52 2 0x00000000 Jun 26 13:41:25 ariel kernel: ffff88012af9f900 0000000000000046 0000000000000000 ffffffff8104b9c8 Jun 26 13:41:25 ariel kernel: 0000000002dae000 ffffea000027a050 000000000000000e 0000000113d439da Jun 26 13:41:25 ariel kernel: ffff88012afa3ad8 ffff88012af9ffd8 0000000000010518 ffff88012afa3ad8 Jun 26 13:41:25 ariel kernel: Call Trace: Jun 26 13:41:25 ariel kernel: [<ffffffff8104b9c8>] ? flush_tlb_others_ipi+0x128/0x130 Jun 26 13:41:25 ariel kernel: [<ffffffff8110c330>] ? sync_page+0x0/0x50 Jun 26 13:41:25 ariel kernel: [<ffffffff814c9a53>] io_schedule+0x73/0xc0 Jun 26 13:41:25 ariel kernel: [<ffffffff8110c36d>] sync_page+0x3d/0x50 Jun 26 13:41:25 ariel kernel: [<ffffffff814ca17a>] __wait_on_bit_lock+0x5a/0xc0 Jun 26 13:41:25 ariel kernel: [<ffffffff8110c307>] __lock_page+0x67/0x70 Jun 26 13:41:25 ariel kernel: [<ffffffff81091ee0>] ? wake_bit_function+0x0/0x50 Jun 26 13:41:25 ariel kernel: [<ffffffff81122781>] ? lru_cache_add_lru+0x21/0x40 Jun 26 13:41:25 ariel kernel: [<ffffffff8115bf10>] lock_page+0x30/0x40 Jun 26 13:41:25 ariel kernel: [<ffffffff8115c58d>] migrate_pages+0x59d/0x5d0 Jun 26 13:41:25 ariel kernel: [<ffffffff81152b20>] ? compaction_alloc+0x0/0x370 Jun 26 13:41:25 ariel kernel: [<ffffffff811525cc>] compact_zone+0x4cc/0x600 Jun 26 13:41:25 ariel kernel: [<ffffffff8111cffc>] ? get_page_from_freelist+0x15c/0x820 Jun 26 13:41:25 ariel kernel: [<ffffffff8115297e>] compact_zone_order+0x7e/0xb0 Jun 26 13:41:25 ariel kernel: [<ffffffff81152ab9>] try_to_compact_pages+0x109/0x170 Jun 26 13:41:25 ariel kernel: [<ffffffff8111e99d>] __alloc_pages_nodemask+0x5ed/0x850 Jun 26 13:41:25 ariel kernel: [<ffffffff81150db3>] alloc_pages_vma+0x93/0x150 Jun 26 13:41:25 ariel kernel: [<ffffffff81165c4b>] khugepaged+0xa9b/0x1210 Jun 26 13:41:25 ariel kernel: [<ffffffff81091ea0>] ? autoremove_wake_function+0x0/0x40 Jun 26 13:41:25 ariel kernel: [<ffffffff811651b0>] ? khugepaged+0x0/0x1210 Jun 26 13:41:25 ariel kernel: [<ffffffff81091b36>] kthread+0x96/0xa0 Jun 26 13:41:25 ariel kernel: [<ffffffff810141ca>] child_rip+0xa/0x20 Jun 26 13:41:25 ariel kernel: [<ffffffff81091aa0>] ? kthread+0x0/0xa0 Jun 26 13:41:25 ariel kernel: [<ffffffff810141c0>] ? child_rip+0x0/0x20 Jun 26 13:41:25 ariel kernel: INFO: task mfsmount:7808 blocked for more than 120 seconds. Jun 26 13:41:25 ariel kernel: "echo 0> /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jun 26 13:41:25 ariel kernel: mfsmount D ffff88012fc23280 0 7808 1 0x00000000 Jun 26 13:41:25 ariel kernel: ffff8800730e1b70 0000000000000086 0000000000000000 0000000000000000 Jun 26 13:41:25 ariel kernel: ffff8800282912c0 0000000000000400 0000000000001000 0000000113d44058 Jun 26 13:41:25 ariel kernel: ffff8801121f45f8 ffff8800730e1fd8 0000000000010518 ffff8801121f45f8 Jun 26 13:41:25 ariel kernel: Call Trace: Jun 26 13:41:25 ariel kernel: [<ffffffff814cb6e5>] rwsem_down_failed_common+0x95/0x1d0 Jun 26 13:41:25 ariel kernel: [<ffffffff81059e02>] ? finish_task_switch+0x42/0xd0 Jun 26 13:41:25 ariel kernel: [<ffffffff814cb876>] rwsem_down_read_failed+0x26/0x30 Jun 26 13:41:25 ariel kernel: [<ffffffff81264db4>] call_rwsem_down_read_failed+0x14/0x30 Jun 26 13:41:25 ariel kernel: [<ffffffff814cad74>] ? down_read+0x24/0x30 Jun 26 13:41:25 ariel kernel: [<ffffffffa0370419>] fuse_copy_fill+0x99/0x1f0 [fuse] Jun 26 13:41:25 ariel kernel: [<ffffffffa03705b1>] fuse_copy_one+0x41/0x70 [fuse] Jun 26 13:41:25 ariel kernel: [<ffffffffa03714c4>] fuse_dev_read+0x224/0x310 [fuse] Jun 26 13:41:25 ariel kernel: [<ffffffff81091ea0>] ? autoremove_wake_function+0x0/0x40 Jun 26 13:41:25 ariel kernel: [<ffffffff8116d19a>] do_sync_read+0xfa/0x140 Jun 26 13:41:25 ariel kernel: [<ffffffff81091ea0>] ? autoremove_wake_function+0x0/0x40 Jun 26 13:41:25 ariel kernel: [<ffffffff81401e77>] ? release_sock+0xb7/0xd0 Jun 26 13:41:25 ariel kernel: [<ffffffff811fff16>] ? security_file_permission+0x16/0x20 Jun 26 13:41:25 ariel kernel: [<ffffffff8116dbc5>] vfs_read+0xb5/0x1a0 Jun 26 13:41:25 ariel kernel: [<ffffffff8116dd01>] sys_read+0x51/0x90 Jun 26 13:41:25 ariel kernel: [<ffffffff81013172>] system_call_fastpath+0x16/0x1b Jun 26 13:41:25 ariel kernel: INFO: task mfsmount:3885 blocked for more than 120 seconds. Jun 26 13:41:25 ariel kernel: "echo 0> /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jun 26 13:41:25 ariel kernel: mfsmount D ffff88012fc23280 0 3885 1 0x00000000 Jun 26 13:41:25 ariel kernel: ffff880063e1bb70 0000000000000086 0000000000000000 ffff880063e1baf8 Jun 26 13:41:25 ariel kernel: ffff880028316980 ffff880063e1bb18 ffffffff8105c846 0000000113d44249 Jun 26 13:41:25 ariel kernel: ffff880044ea6678 ffff880063e1bfd8 0000000000010518 ffff880044ea6678 Jun 26 13:41:25 ariel kernel: Call Trace: Jun 26 13:41:25 ariel kernel: [<ffffffff8105c846>] ? update_curr+0xe6/0x1e0 Jun 26 13:41:25 ariel kernel: [<ffffffff81061c61>] ? dequeue_entity+0x1a1/0x1e0 Jun 26 13:41:25 ariel kernel: [<ffffffff814cb6e5>] rwsem_down_failed_common+0x95/0x1d0 Jun 26 13:41:25 ariel kernel: [<ffffffff81059e02>] ? finish_task_switch+0x42/0xd0 Jun 26 13:41:25 ariel kernel: [<ffffffff814cb876>] rwsem_down_read_failed+0x26/0x30 Jun 26 13:41:25 ariel kernel: [<ffffffff81264db4>] call_rwsem_down_read_failed+0x14/0x30 Jun 26 13:41:25 ariel kernel: [<ffffffff814cad74>] ? down_read+0x24/0x30 Jun 26 13:41:25 ariel kernel: [<ffffffffa0370419>] fuse_copy_fill+0x99/0x1f0 [fuse] Jun 26 13:41:25 ariel kernel: [<ffffffffa03705b1>] fuse_copy_one+0x41/0x70 [fuse] Jun 26 13:41:25 ariel kernel: [<ffffffffa03714c4>] fuse_dev_read+0x224/0x310 [fuse] Jun 26 13:41:25 ariel kernel: [<ffffffff81091ea0>] ? autoremove_wake_function+0x0/0x40 Jun 26 13:41:25 ariel kernel: [<ffffffff8116d19a>] do_sync_read+0xfa/0x140 Jun 26 13:41:25 ariel kernel: [<ffffffff81091ea0>] ? autoremove_wake_function+0x0/0x40 Jun 26 13:41:25 ariel kernel: [<ffffffff81401e77>] ? release_sock+0xb7/0xd0 Jun 26 13:41:25 ariel kernel: [<ffffffff811fff16>] ? security_file_permission+0x16/0x20 Jun 26 13:41:25 ariel kernel: [<ffffffff8116dbc5>] vfs_read+0xb5/0x1a0 Jun 26 13:41:25 ariel kernel: [<ffffffff8116dd01>] sys_read+0x51/0x90 Jun 26 13:41:25 ariel kernel: [<ffffffff81013172>] system_call_fastpath+0x16/0x1b Jun 26 13:41:25 ariel kernel: INFO: task mfsmount:21898 blocked for more than 120 seconds. Jun 26 13:41:25 ariel kernel: "echo 0> /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jun 26 13:41:25 ariel kernel: mfsmount D ffff88012fc23280 0 21898 1 0x00000000 Jun 26 13:41:25 ariel kernel: ffff8800cdfe7b70 0000000000000086 0000000000000000 0000000000000000 Jun 26 13:41:25 ariel kernel: ffff8800282912c0 0000000000000400 0000000000001000 0000000113d439a9 Jun 26 13:41:25 ariel kernel: ffff8801298d45f8 ffff8800cdfe7fd8 0000000000010518 ffff8801298d45f8 Jun 26 13:41:25 ariel kernel: Call Trace: Jun 26 13:41:25 ariel kernel: [<ffffffff814cb6e5>] rwsem_down_failed_common+0x95/0x1d0 Jun 26 13:41:25 ariel kernel: [<ffffffff81059e02>] ? finish_task_switch+0x42/0xd0 Jun 26 13:41:25 ariel kernel: [<ffffffff814cb876>] rwsem_down_read_failed+0x26/0x30 Jun 26 13:41:25 ariel kernel: [<ffffffff81264db4>] call_rwsem_down_read_failed+0x14/0x30 Jun 26 13:41:25 ariel kernel: [<ffffffff814cad74>] ? down_read+0x24/0x30 Jun 26 13:41:25 ariel kernel: [<ffffffffa0370419>] fuse_copy_fill+0x99/0x1f0 [fuse] Jun 26 13:41:25 ariel kernel: [<ffffffffa03705b1>] fuse_copy_one+0x41/0x70 [fuse] Jun 26 13:41:25 ariel kernel: [<ffffffffa03714c4>] fuse_dev_read+0x224/0x310 [fuse] Jun 26 13:41:25 ariel kernel: [<ffffffff81091ea0>] ? autoremove_wake_function+0x0/0x40 Jun 26 13:41:25 ariel kernel: [<ffffffff8116d19a>] do_sync_read+0xfa/0x140 Jun 26 13:41:25 ariel kernel: [<ffffffff81091ea0>] ? autoremove_wake_function+0x0/0x40 Jun 26 13:41:25 ariel kernel: [<ffffffff8118bf70>] ? mntput_no_expire+0x30/0x110 Jun 26 13:41:25 ariel kernel: [<ffffffff811fff16>] ? security_file_permission+0x16/0x20 Jun 26 13:41:25 ariel kernel: [<ffffffff8116dbc5>] vfs_read+0xb5/0x1a0 Jun 26 13:41:25 ariel kernel: [<ffffffff8116dd01>] sys_read+0x51/0x90 Jun 26 13:41:25 ariel kernel: [<ffffffff81013172>] system_call_fastpath+0x16/0x1b and so on. |