Re: [SSI-devel] SSI-1.2.2-FC shm oops
Brought to you by:
brucewalker,
rogertsang
From: Laura R. <lau...@hp...> - 2005-04-12 19:06:12
|
Hi Roger, I'll check in the fix then. This fix only deals with the shm structures so it shouldnt be related to the unixnm.c panic at all. laura Roger Tsang wrote: > Laura, > > Thanks. I've incorporated your patch into my kernel recompile last > night and the cluster seems to be running fine so far after two > failovers, once on each initnode. I hope this is not related to the > unixnm.c oops because I assumed that was due to kernel networking > options packet socket. > > -Roger > > On Apr 11, 2005 8:54 PM, Laura Ramirez <lau...@hp...> wrote: > >>Hi Roger, >> >>Looking at the shm nodedown code, i saw some locking that didnt look >>right. I have attached a patch file with a fix. I dont know if this >>will fix your shm panic, but if you want to give it a try, >>please let me know how it goes. (use -p0 to apply patch) >> >>Also, is it possible to get a netdump image, if it does panic again, >>or if you get a kdb prompt dump the following: >>kdb> bt >> >>kdb> md shm_ids >> >>kdb> md cfs_shm_node_mnts >> >>thanks >> >>laura >> >>Roger Tsang wrote: >> >>>Hi I'm using SSI-1.2.2-FC2 9smp with Lustre-1.2.4 patch. node2 is a >>>failover node, so I failed over to this node once and after a few >>>hours running as the only node in the cluster I got the following. >>> >>>-Roger >>> >>>Apr 11 19:25:42 node2 kernel: ------------[ cut here ]------------ >>>Apr 11 19:25:42 node2 kernel: kernel BUG at shm.c:232! >>>Apr 11 19:25:42 node2 kernel: invalid operand: 0000 >>>Apr 11 19:25:42 node2 kernel: ipt_REJECT ipt_multiport ipt_state >>>ip_conntrack ipt_TCPMSS iptable_filter ip_tables loop nfsd cls_u32 >>>sch_sfq sch_htb tun microcode ide-cd sr_mod cdrom floppy >>>Apr 11 19:25:42 node2 kernel: CPU: 0 >>>Apr 11 19:25:42 node2 kernel: EIP: 0060:[<c01c9a90>] Not tainted >>>Apr 11 19:25:42 node2 kernel: EFLAGS: 00010246 >>>Apr 11 19:25:42 node2 kernel: >>>Apr 11 19:25:42 node2 kernel: EIP is at shm_close [kernel] 0xb0 >>>(2.4.22-1.2199.nptl_ssi_9smp) >>>Apr 11 19:25:42 node2 kernel: eax: d8ad9000 ebx: c05f0fb0 ecx: >>>c05f0fb0 edx: 00000000 >>>Apr 11 19:25:42 node2 kernel: esi: 02000000 edi: bd311000 ebp: >>>d3fcde64 esp: d3fcde5c >>>Apr 11 19:25:42 node2 kernel: ds: 0068 es: 0068 ss: 0068 >>>Apr 11 19:25:42 node2 kernel: Process httpd (pid: 132848, stackpage=d3fcd000) >>>Apr 11 19:25:42 node2 kernel: Call Trace: >>>Apr 11 19:25:42 node2 kernel: [<c0139c86>] exit_mmap [kernel] 0x166 (0xd3fcde68) >>>Apr 11 19:25:42 node2 kernel: [<c011f3cc>] mmput [kernel] 0x4c (0xd3fcde90) >>>Apr 11 19:25:42 node2 kernel: [<c0125319>] do_exit [kernel] 0xe9 (0xd3fcdea4) >>>Apr 11 19:25:42 node2 kernel: [<c01256e2>] do_group_exit [kernel] 0x32 >>>(0xd3fcdec4) >>>Apr 11 19:25:42 node2 kernel: [<c012ecf4>] get_signal_to_deliver >>>[kernel] 0x2b4 (0xd3fcded8) >>>Apr 11 19:25:42 node2 kernel: [<c01142ff>] restore_i387_fxsave >>>[kernel] 0xaf (0xd3fcdee8) >>>Apr 11 19:25:42 node2 kernel: [<c010b91f>] do_signal [kernel] 0x4f (0xd3fcdf1c) >>>Apr 11 19:25:42 node2 kernel: [<c0109a68>] restore_sigcontext [kernel] >>>0x458 (0xd3fcdf28) >>>Apr 11 19:25:42 node2 kernel: [<c0109ed6>] sys_sigreturn [kernel] >>>0x106 (0xd3fcdf90) >>>Apr 11 19:25:42 node2 kernel: [<c010bb10>] signal_return [kernel] 0x14 >>>(0xd3fcdfc0) >>>Apr 11 19:25:42 node2 kernel: >>>Apr 11 19:25:42 node2 kernel: Code: 0f 0b e8 00 ea db 38 c0 eb a5 8d >>>b6 00 00 00 00 a1 c4 0f 5f >>>Apr 11 19:27:17 node2 kernel: ------------[ cut here ]------------ >>>Apr 11 19:27:17 node2 kernel: kernel BUG at shm.c:169! >>>Apr 11 19:27:17 node2 kernel: invalid operand: 0000 >>>Apr 11 19:27:17 node2 kernel: ipt_REJECT ipt_multiport ipt_state >>>ip_conntrack ipt_TCPMSS iptable_filter ip_tables loop nfsd cls_u32 >>>sch_sfq sch_htb tun microcode ide-cd sr_mod cdrom floppy >>>Apr 11 19:27:17 node2 kernel: CPU: 0 >>>Apr 11 19:27:17 node2 kernel: EIP: 0060:[<c01c9930>] Not tainted >>>Apr 11 19:27:17 node2 kernel: EFLAGS: 00010246 >>>Apr 11 19:27:17 node2 kernel: >>>Apr 11 19:27:17 node2 kernel: EIP is at shm_open [kernel] 0x60 >>>(2.4.22-1.2199.nptl_ssi_9smp) >>>Apr 11 19:27:17 node2 kernel: eax: d8ad9000 ebx: d40e3c80 ecx: >>>bf000000 edx: 00000000 >>>Apr 11 19:27:17 node2 kernel: esi: d5d40600 edi: 00000000 ebp: >>>d4955ec4 esp: d4955ec4 >>>Apr 11 19:27:17 node2 kernel: ds: 0068 es: 0068 ss: 0068 >>>Apr 11 19:27:17 node2 kernel: Process httpd (pid: 132813, stackpage=d4955000) >>>Apr 11 19:27:17 node2 kernel: Call Trace: >>>Apr 11 19:27:17 node2 kernel: [<c011f8a9>] copy_mm [kernel] 0x389 (0xd4955ec8) >>>Apr 11 19:27:17 node2 kernel: [<c01201f9>] __copy_process [kernel] >>>0x399 (0xd4955f04) >>>Apr 11 19:27:17 node2 kernel: [<c0120a22>] __do_fork [kernel] 0x52 (0xd4955f4c) >>>Apr 11 19:27:17 node2 kernel: [<c0107e65>] sys_clone [kernel] 0x45 (0xd4955f9c) >>>Apr 11 19:27:17 node2 kernel: [<c010bad7>] system_call [kernel] 0x33 >>>(0xd4955fc0) >>>Apr 11 19:27:17 node2 kernel: >>>Apr 11 19:27:17 node2 kernel: Code: 0f 0b a9 00 ea db 38 c0 eb d2 8d >>>b6 00 00 00 00 a1 c4 0f 5f >>> >>> >>>------------------------------------------------------- >>>SF email is sponsored by - The IT Product Guide >>>Read honest & candid reviews on hundreds of IT Products from real users. >>>Discover which products truly live up to the hype. Start reading now. >>>http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click >>>_______________________________________________ >>>ssic-linux-devel mailing list >>>ssi...@li... >>>https://lists.sourceforge.net/lists/listinfo/ssic-linux-devel >>> >>> >> >> >>Index: ipc/shm.c >>=================================================================== >>RCS file: /cvsroot/ssic-linux/openssi/kernel/ipc/shm.c,v >>retrieving revision 1.2.2.25 >>diff -u -p -r1.2.2.25 shm.c >>--- ipc/shm.c 17 Dec 2004 22:21:13 -0000 1.2.2.25 >>+++ ipc/shm.c 12 Apr 2005 00:28:05 -0000 >>@@ -1235,12 +1235,14 @@ ipc_shm_nodedown(clusternode_t node) >> } >> } >> else { >>+ int id = shp->id; >>+ ipc_get_locks(id, &shm_ids, 1); >> shp->shm_flags |= SHM_DEST; >> if (shp->shm_nattch == 0) { >>- ipc_get_locks(shp->id, &shm_ids, 1); >> ssi_local_destroy(shp); >>- up(&shm_ids.sem); >>+ id = 0; >> } >>+ ipc_drop_locks(id, &shm_ids, 1); >> } >> } >> } >> >> >> > > > |