From: WK <wk...@bn...> - 2011-10-20 21:39:16
|
We stopped seeing the issue a few months ago. We assume recent kernel updates have mitigated the problem or it may simply be that we have been actively upgrading our cluster hardware (especially the masters) with more RAM to avoid the 'rewrite MetaData on the hour' stall We never saw it on our Cent5 machines and some of them are really quite busy (IMAP servers). Of course, now that I have declared that the situation is no longer there, I am sure it will happen in a few days just to spite me. -bill On 10/20/11 12:51 AM, Laurent Wandrebeck wrote: > Unburying this thread :) > Have you found any working solution about it ? > I've tried > echo never> /sys/kernel/mm/redhat_transparent_hugepage/defrag > echo no> /sys/kernel/mm/redhat_transparent_hugepage/khugepaged/defrag > but I still get (less though) task stuck etc etc running C6 x86_64. > Even « funnier », a user have been able to trigger it under C5 > x86_64, running a user-space program (data processing, data being > located on mfs volume ) ! Here's the C5 trace… > > INFO: task polymer-na-spg:17348 blocked for more than 120 seconds. > "echo 0> /proc/sys/kernel/hung_task_timeout_secs" disables this > message. polymer-na-sp D ffff810001025e20 0 17348 > 17347 (NOTLB) ffff81018b9d5c08 0000000000000086 > ffff810210a141d0 ffffffff8863b219 ffff81081e89d000 0000000000000007 > ffff81037fb040c0 ffff81042e1f57e0 003ab01ace8dae8c 0000000000007bc3 > ffff81037fb042a8 000000048863ff35 Call Trace: > [<ffffffff8863b219>] :fuse:flush_bg_queue+0x2b/0x48 > [<ffffffff8006e1db>] do_gettimeofday+0x40/0x90 > [<ffffffff80028a85>] sync_page+0x0/0x43 > [<ffffffff800637ea>] io_schedule+0x3f/0x67 > [<ffffffff80028ac3>] sync_page+0x3e/0x43 > [<ffffffff8006392e>] __wait_on_bit_lock+0x36/0x66 > [<ffffffff8003fbc7>] __lock_page+0x5e/0x64 > [<ffffffff800a0a06>] wake_bit_function+0x0/0x23 > [<ffffffff8000c2e4>] do_generic_mapping_read+0x1df/0x359 > [<ffffffff8000d0fd>] file_read_actor+0x0/0x159 > [<ffffffff8000c5aa>] __generic_file_aio_read+0x14c/0x198 > [<ffffffff800c6774>] generic_file_read+0xac/0xc5 > [<ffffffff800a09d8>] autoremove_wake_function+0x0/0x2e > [<ffffffff8000e129>] do_mmap_pgoff+0x615/0x780 > [<ffffffff8012d629>] selinux_file_permission+0x9f/0xb6 > [<ffffffff8000b69a>] vfs_read+0xcb/0x171 > [<ffffffff80011bac>] sys_read+0x45/0x6e > [<ffffffff8005d28d>] tracesys+0xd5/0xe0 > > (and so on) > Thanks ! |