From: Michael H. <mik...@gm...> - 2010-07-02 15:59:35
|
We have a glusterfs client that crashes periodically in the fuse code. It seems to crash when under virtually no load (cpu, memory, network). We've been pulling our hair out trying to understand why. Any insight would be greatly appreciated. == Versions == gentoo on ec2 with kernel: 2.6.18 glusterfs: 3.0.2 libfuse: 2.8.1 modfuse: 2.8.1 == Kernel Output == This is example1.ec2 (Linux x86_64 2.6.18-xenU-ec2-v1.0) 20:55:17 example1 login: [9691928.361493] Unable to handle kernel paging request at 0000000000100108 RIP: [9691928.361505] [<ffffffff88000af5>] :fuse:request_end+0x45/0x110 [9691928.361525] PGD 1c06fc067 PUD 1c0f6d067 PMD 0 [9691928.361531] Oops: 0002 [1] SMP [9691928.361534] CPU 0 [9691928.361538] Modules linked in: autofs4 ip_conntrack nfnetlink dm_mod loop fuse [9691928.361546] Pid: 5558, comm: glusterfs Not tainted 2.6.18-xenU-ec2-v1.0 #2 [9691928.361551] RIP: e030:[<ffffffff88000af5>] [<ffffffff88000af5>] :fuse:request_end+0x45/0x110 [9691928.361561] RSP: e02b:ffff8801c0169cc8 EFLAGS: 00010246 [9691928.361565] RAX: 0000000000200200 RBX: ffff88017189f328 RCX: ffff88017189f338 [9691928.361570] RDX: 0000000000100100 RSI: ffff88017189f328 RDI: ffff8801d9ed3200 [9691928.361575] RBP: ffff8801d9ed3200 R08: e000000000000000 R09: ffffffff80495800 [9691928.361579] R10: 0000000000000000 R11: 0000000000000002 R12: 0000000000000000 [9691928.361583] R13: 0000000000000000 R14: 0000000000000030 R15: 0000000000000000 [9691928.361589] FS: 00002afc28cf9950(0000) GS:ffffffff804f1000(0000) knlGS:0000000000000000 [9691928.361593] CS: e033 DS: 0000 ES: 0000 [9691928.361596] Process glusterfs (pid: 5558, threadinfo ffff8801c0168000, task ffff88000107c7f0) [9691928.361600] Stack: 0000000000000000 ffff8801d9ed3200 ffff88017189f328 ffffffff88001428 [9691928.361609] ffff8801c0169e88 ffff8801d2b2d2c0 ffff88000107c7f0 ffff8801d9e0a3a0 [9691928.361616] ffff8801d94364b0 ffff8801d9c33a60 ffff8801d8f97580 0000000000000002 [9691928.361622] Call Trace: [9691928.361629] [<ffffffff88001428>] :fuse:fuse_dev_readv+0x418/0x4f0 [9691928.361638] [<ffffffff80369b57>] inode_has_perm+0x67/0x90 [9691928.361644] [<ffffffff80234d13>] __wake_up+0x43/0x70 [9691928.361650] [<ffffffff80369c34>] file_has_perm+0xb4/0xf0 [9691928.361657] [<ffffffff80289ad0>] default_wake_function+0x0/0x10 [9691928.361664] [<ffffffff802cd3af>] do_readv_writev+0x1df/0x340 [9691928.361671] [<ffffffff88001500>] :fuse:fuse_dev_read+0x0/0x20 [9691928.361677] [<ffffffff802cd673>] sys_readv+0x53/0xc0 [9691928.361683] [<ffffffff80269252>] system_call+0x86/0x8b [9691928.361688] [<ffffffff802691cc>] system_call+0x0/0x8b [9691928.361691] [9691928.361693] [9691928.361693] Code: 48 89 42 08 48 89 10 48 c7 41 08 00 02 20 00 f6 46 30 08 48 [9691928.361709] RIP [<ffffffff88000af5>] :fuse:request_end+0x45/0x110 [9691928.361716] RSP <ffff8801c0169cc8> [9691928.361719] CR2: 0000000000100108 [9691928.361723] <3>BUG: soft lockup detected on CPU#1! [9691938.124064] [9691938.124066] Call Trace: [9691938.124069] <IRQ> [<ffffffff802b58e6>] softlockup_tick+0xf6/0x120 [9691938.124085] [<ffffffff80276466>] timer_interrupt+0x416/0x480 [9691938.124092] [<ffffffff80211ee1>] handle_IRQ_event+0x51/0xa0 [9691938.124096] [<ffffffff802b5cbb>] __do_IRQ+0xcb/0x150 [9691938.124102] [<ffffffff80269f04>] call_softirq+0x1c/0x28 [9691938.124107] [<ffffffff8027499d>] do_IRQ+0x6d/0x90 [9691938.124114] [<ffffffff803ae29f>] evtchn_do_upcall+0xef/0x160 [9691938.124119] [<ffffffff80269a3a>] do_hypervisor_callback+0x1e/0x2c [9691938.124122] <EOI> [<ffffffff80241ab0>] generic_drop_inode+0x0/0x170 [9691938.124131] [<ffffffff8026df07>] .text.lock.spinlock+0x5/0x8e [9691938.124143] [<ffffffff88001927>] :fuse:request_send+0x27/0x360 [9691938.124150] [<ffffffff8800025a>] :fuse:queue_request+0x6a/0x90 [9691938.124157] [<ffffffff88002e96>] :fuse:fuse_lookup+0x96/0x250 [9691938.124164] [<ffffffff88002d72>] :fuse:fuse_dentry_revalidate+0xf2/0x180 [9691938.124169] [<ffffffff8020eea3>] dput+0x23/0x170 [9691938.124175] [<ffffffff802d8381>] prune_one_dentry+0x81/0xa0 [9691938.124181] [<ffffffff80226dc3>] d_alloc+0x183/0x200 [9691938.124185] [<ffffffff8020e98e>] do_lookup+0xde/0x1d0 [9691938.124190] [<ffffffff8020a704>] __link_path_walk+0xa44/0xfa0 [9691938.124195] [<ffffffff80210489>] link_path_walk+0x89/0x140 [9691938.124202] [<ffffffff80279a85>] xen_send_IPI_mask+0xc5/0x110 [9691938.124206] [<ffffffff8020e70e>] do_path_lookup+0x28e/0x320 [9691938.124211] [<ffffffff80227eda>] __path_lookup_intent_open+0x6a/0xd0 [9691938.124217] [<ffffffff8021e071>] open_namei+0x81/0x730 [9691938.124222] [<ffffffff8022c1dc>] do_filp_open+0x1c/0x40 [9691938.124227] [<ffffffff8021cb48>] do_sys_open+0x58/0xf0 [9691938.124232] [<ffffffff80269252>] system_call+0x86/0x8b [9691938.124236] [<ffffffff802691cc>] system_call+0x0/0x8b [9691938.124240] [9691941.034262] BUG: soft lockup detected on CPU#0! [9691941.034278] [9691941.034279] Call Trace: [9691941.034282] <IRQ> [<ffffffff802b58e6>] softlockup_tick+0xf6/0x120 [9691941.034298] [<ffffffff80276466>] timer_interrupt+0x416/0x480 [9691941.034304] [<ffffffff80211ee1>] handle_IRQ_event+0x51/0xa0 [9691941.034309] [<ffffffff802b5cbb>] __do_IRQ+0xcb/0x150 [9691941.034314] [<ffffffff80269f04>] call_softirq+0x1c/0x28 [9691941.034319] [<ffffffff8027499d>] do_IRQ+0x6d/0x90 [9691941.034325] [<ffffffff803ae29f>] evtchn_do_upcall+0xef/0x160 [9691941.034331] [<ffffffff80269a3a>] do_hypervisor_callback+0x1e/0x2c [9691941.034334] <EOI> [<ffffffff8036e8e0>] selinux_inode_permission+0x0/0xc0 [9691941.034343] [<ffffffff8026df04>] .text.lock.spinlock+0x2/0x8e [9691941.034353] [<ffffffff88001927>] :fuse:request_send+0x27/0x360 [9691941.034361] [<ffffffff88005e81>] :fuse:fuse_statfs+0x91/0x120 [9691941.034368] [<ffffffff802cc80b>] vfs_statfs+0x6b/0xa0 [9691941.034373] [<ffffffff802cca5a>] vfs_statfs_native+0x2a/0x70 [9691941.034379] [<ffffffff802ccbaa>] sys_statfs+0x5a/0xc0 [9691941.034383] [<ffffffff8020fdc0>] free_pages_and_swap_cache+0x80/0xa0 [9691941.034388] [<ffffffff80214098>] unmap_region+0x128/0x160 [9691941.034394] [<ffffffff8021dd4c>] remove_vma+0x4c/0x60 [9691941.034398] [<ffffffff80213022>] do_munmap+0x292/0x2e0 [9691941.034403] [<ffffffff80269252>] system_call+0x86/0x8b [9691941.034407] [<ffffffff802691cc>] system_call+0x0/0x8b [9691941.034410] |