From: Robert D. <ro...@in...> - 2011-03-10 02:20:27
|
Hello, I have a few servers which are running mooseFS great, however, there is one that is continually having issues when running mfschunkserver. The process itself will run for approximately three hours, then drop connection to the master. When it drops connection the master, I can stop the process, start the process again, but when I restart the mfschunkserver process the system becomes unstable and hard-locks. The output I receive is the following from messages: (the last few lines are irrelevant, it just shows I was able to plug a usb keyboard in the system, tried restarting the mfschunkserver, then had to reboot) Mar 9 17:54:50 brickc kernel: [11017.971891] Pid: 1549, comm: mfschunkserver Tainted: G D 2.6.35-22-server #33-Ubuntu DH55HC/ Mar 9 17:54:50 brickc kernel: [11017.973678] RIP: 0010:[<ffffffff8110d241>] [<ffffffff8110d241>] page_evictable+0x21/0x80 Mar 9 17:54:50 brickc kernel: [11017.975470] RSP: 0018:ffff880378931438 EFLAGS: 00010286 Mar 9 17:54:50 brickc kernel: [11017.977256] RAX: fff688000668a848 RBX: ffffea0009ea1b88 RCX: ffffffffffffffd0 Mar 9 17:54:50 brickc kernel: [11017.979067] RDX: 020000000000080d RSI: 0000000000000000 RDI: ffffea0009ea1b88 Mar 9 17:54:50 brickc kernel: [11017.980849] RBP: ffff880378931438 R08: dead000000200200 R09: dead000000100100 Mar 9 17:54:50 brickc kernel: [11017.982637] R10: ffff8801000013d0 R11: 0000000000000000 R12: ffffea0009ea1bb0 Mar 9 17:54:50 brickc kernel: [11017.984417] R13: ffff880378931878 R14: ffff8803789316a8 R15: ffff8801000012d8 Mar 9 17:54:50 brickc kernel: [11017.986230] FS: 00007f7178d99700(0000) GS:ffff880001e20000(0000) knlGS:0000000000000000 Mar 9 17:54:50 brickc kernel: [11017.988035] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b Mar 9 17:54:50 brickc kernel: [11017.989887] CR2: 00007f7178123000 CR3: 0000000419f6d000 CR4: 00000000000006e0 Mar 9 17:54:50 brickc kernel: [11017.991715] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Mar 9 17:54:50 brickc kernel: [11017.993577] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Mar 9 17:54:50 brickc kernel: [11017.995448] Process mfschunkserver (pid: 1549, threadinfo ffff880378930000, task ffff8804169f16e0) Mar 9 17:54:50 brickc kernel: [11017.999197] ffff880378931548 ffffffff8110ed70 0000000000000000 ffff8803789314f8 Mar 9 17:54:50 brickc kernel: [11017.999229] <0> 0000000000000000 ffff88000668a848 0000000000000000 0000000000000000 Mar 9 17:54:50 brickc kernel: [11018.001143] <0> 0000000000000000 0000000000000001 ffff8803789314a8 ffffffff81149b01 Mar 9 17:54:50 brickc kernel: [11018.006906] [<ffffffff8110ed70>] shrink_page_list+0x100/0x580 Mar 9 17:54:50 brickc kernel: [11018.008839] [<ffffffff81149b01>] ? mem_cgroup_del_lru_list+0x21/0xa0 Mar 9 17:54:50 brickc kernel: [11018.010790] [<ffffffff81149c09>] ? mem_cgroup_del_lru+0x39/0x40 Mar 9 17:54:50 brickc kernel: [11018.012724] [<ffffffff8110d98b>] ? isolate_lru_pages+0xdb/0x260 Mar 9 17:54:50 brickc kernel: [11018.014637] [<ffffffff8110f4bd>] shrink_inactive_list+0x2cd/0x7f0 Mar 9 17:54:50 brickc kernel: [11018.016550] [<ffffffff81107337>] ? __alloc_pages_slowpath+0x1a7/0x590 Mar 9 17:54:50 brickc kernel: [11018.018462] [<ffffffff8110d772>] ? get_scan_count+0x172/0x2b0 Mar 9 17:54:50 brickc kernel: [11018.020328] [<ffffffff8110fb8b>] shrink_zone+0x1ab/0x230 Mar 9 17:54:50 brickc kernel: [11018.022159] [<ffffffff8110fc93>] shrink_zones+0x83/0x130 Mar 9 17:54:50 brickc kernel: [11018.023986] [<ffffffff8110fdde>] do_try_to_free_pages+0x9e/0x360 Mar 9 17:54:50 brickc kernel: [11018.025811] [<ffffffff8111024b>] try_to_free_pages+0x6b/0x70 Mar 9 17:54:50 brickc kernel: [11018.027612] [<ffffffff8110740a>] __alloc_pages_slowpath+0x27a/0x590 Mar 9 17:54:50 brickc kernel: [11018.029414] [<ffffffff8122b77a>] ? __jbd2_log_space_left+0x1a/0x40 Mar 9 17:54:50 brickc kernel: [11018.031220] [<ffffffff81107884>] __alloc_pages_nodemask+0x164/0x1d0 Mar 9 17:54:50 brickc kernel: [11018.033020] [<ffffffff811397ba>] alloc_pages_current+0x9a/0x100 Mar 9 17:54:50 brickc kernel: [11018.034801] [<ffffffff81100da7>] __page_cache_alloc+0x87/0x90 Mar 9 17:54:50 brickc kernel: [11018.036600] [<ffffffff8110215c>] grab_cache_page_write_begin+0x7c/0xc0 Mar 9 17:54:50 brickc kernel: [11018.038474] [<ffffffff811f1964>] ext4_da_write_begin+0x144/0x290 Mar 9 17:54:50 brickc kernel: [11018.040825] [<ffffffff811f21ed>] ? ext4_da_write_end+0xfd/0x2e0 Mar 9 17:54:50 brickc kernel: [11018.043165] [<ffffffff8104ee12>] ? enqueue_entity+0x132/0x1b0 Mar 9 17:54:50 brickc kernel: [11018.045448] [<ffffffff810ffb36>] ? iov_iter_copy_from_user_atomic+0x96/0x170 Mar 9 17:54:50 brickc kernel: [11018.047769] [<ffffffff810ffe62>] generic_perform_write+0xc2/0x1d0 Mar 9 17:54:50 brickc kernel: [11018.049582] [<ffffffff810fffd4>] generic_file_buffered_write+0x64/0xa0 Mar 9 17:54:50 brickc kernel: [11018.051321] [<ffffffff811028e0>] __generic_file_aio_write+0x240/0x470 Mar 9 17:54:50 brickc kernel: [11018.053052] [<ffffffff810901ed>] ? futex_wait_queue_me+0xcd/0x110 Mar 9 17:54:50 brickc kernel: [11018.054751] [<ffffffff81102b75>] generic_file_aio_write+0x65/0xd0 Mar 9 17:54:50 brickc kernel: [11018.056442] [<ffffffff811e77a9>] ext4_file_write+0x39/0xb0 Mar 9 17:54:50 brickc kernel: [11018.058204] [<ffffffff81152bea>] do_sync_write+0xda/0x120 Mar 9 17:54:50 brickc kernel: [11018.059901] [<ffffffff8159e76e>] ? _raw_spin_lock+0xe/0x20 Mar 9 17:54:50 brickc kernel: [11018.061568] [<ffffffff81090a62>] ? futex_wake+0x112/0x130 Mar 9 17:54:50 brickc kernel: [11018.063232] [<ffffffff8128f208>] ? apparmor_file_permission+0x18/0x20 Mar 9 17:54:50 brickc kernel: [11018.064900] [<ffffffff8125e7a6>] ? security_file_permission+0x16/0x20 Mar 9 17:54:50 brickc kernel: [11018.066565] [<ffffffff81152ec8>] vfs_write+0xb8/0x1a0 Mar 9 17:54:50 brickc kernel: [11018.068234] [<ffffffff81153862>] sys_pwrite64+0x82/0xa0 Mar 9 17:54:50 brickc kernel: [11018.069913] [<ffffffff8100a0f2>] system_call_fastpath+0x16/0x1b Mar 9 17:54:50 brickc kernel: [11018.077405] RSP <ffff880378931438> Mar 9 17:54:50 brickc kernel: [11018.079482] ---[ end trace 5c000d67753ebd63 ]--- Mar 9 17:59:04 brickc kernel: [11271.114370] usb 2-1.3: new low speed USB device using ehci_hcd and address 4 Mar 9 17:59:04 brickc kernel: [11271.253770] input: USB Keyboard as /devices/pci0000:00/0000:00:1d.0/usb2/2-1/2-1.3/2-1.3:1.0/input/input5 Mar 9 17:59:04 brickc kernel: [11271.259128] generic-usb 0003:04D9:1603.0003: input,hidraw0: USB HID v1.10 Keyboard [ USB Keyboard] on usb-0000:00:1d.0-1.3/input0 Mar 9 17:59:04 brickc kernel: [11271.279819] input: USB Keyboard as /devices/pci0000:00/0000:00:1d.0/usb2/2-1/2-1.3/2-1.3:1.1/input/input6 Mar 9 17:59:04 brickc kernel: [11271.284307] generic-usb 0003:04D9:1603.0004: input,hidraw1: USB HID v1.10 Device [ USB Keyboard] on usb-0000:00:1d.0-1.3/input1 Mar 9 18:06:20 brickc kernel: imklog 4.2.0, log source = /proc/kmsg started. Ideas for debugging this issue? |