Re: [SSI-devel] SSI-1.2.2-FC shm oops
Brought to you by:
brucewalker,
rogertsang
From: Roger T. <rog...@gm...> - 2005-04-12 11:56:03
|
Laura, Thanks. I've incorporated your patch into my kernel recompile last night and the cluster seems to be running fine so far after two failovers, once on each initnode. I hope this is not related to the unixnm.c oops because I assumed that was due to kernel networking options packet socket. -Roger On Apr 11, 2005 8:54 PM, Laura Ramirez <lau...@hp...> wrote: > Hi Roger, > > Looking at the shm nodedown code, i saw some locking that didnt look > right. I have attached a patch file with a fix. I dont know if this > will fix your shm panic, but if you want to give it a try, > please let me know how it goes. (use -p0 to apply patch) > > Also, is it possible to get a netdump image, if it does panic again, > or if you get a kdb prompt dump the following: > kdb> bt > > kdb> md shm_ids > > kdb> md cfs_shm_node_mnts > > thanks > > laura > > Roger Tsang wrote: > > Hi I'm using SSI-1.2.2-FC2 9smp with Lustre-1.2.4 patch. node2 is a > > failover node, so I failed over to this node once and after a few > > hours running as the only node in the cluster I got the following. > > > > -Roger > > > > Apr 11 19:25:42 node2 kernel: ------------[ cut here ]------------ > > Apr 11 19:25:42 node2 kernel: kernel BUG at shm.c:232! > > Apr 11 19:25:42 node2 kernel: invalid operand: 0000 > > Apr 11 19:25:42 node2 kernel: ipt_REJECT ipt_multiport ipt_state > > ip_conntrack ipt_TCPMSS iptable_filter ip_tables loop nfsd cls_u32 > > sch_sfq sch_htb tun microcode ide-cd sr_mod cdrom floppy > > Apr 11 19:25:42 node2 kernel: CPU: 0 > > Apr 11 19:25:42 node2 kernel: EIP: 0060:[<c01c9a90>] Not tainted > > Apr 11 19:25:42 node2 kernel: EFLAGS: 00010246 > > Apr 11 19:25:42 node2 kernel: > > Apr 11 19:25:42 node2 kernel: EIP is at shm_close [kernel] 0xb0 > > (2.4.22-1.2199.nptl_ssi_9smp) > > Apr 11 19:25:42 node2 kernel: eax: d8ad9000 ebx: c05f0fb0 ecx: > > c05f0fb0 edx: 00000000 > > Apr 11 19:25:42 node2 kernel: esi: 02000000 edi: bd311000 ebp: > > d3fcde64 esp: d3fcde5c > > Apr 11 19:25:42 node2 kernel: ds: 0068 es: 0068 ss: 0068 > > Apr 11 19:25:42 node2 kernel: Process httpd (pid: 132848, stackpage=d3fcd000) > > Apr 11 19:25:42 node2 kernel: Call Trace: > > Apr 11 19:25:42 node2 kernel: [<c0139c86>] exit_mmap [kernel] 0x166 (0xd3fcde68) > > Apr 11 19:25:42 node2 kernel: [<c011f3cc>] mmput [kernel] 0x4c (0xd3fcde90) > > Apr 11 19:25:42 node2 kernel: [<c0125319>] do_exit [kernel] 0xe9 (0xd3fcdea4) > > Apr 11 19:25:42 node2 kernel: [<c01256e2>] do_group_exit [kernel] 0x32 > > (0xd3fcdec4) > > Apr 11 19:25:42 node2 kernel: [<c012ecf4>] get_signal_to_deliver > > [kernel] 0x2b4 (0xd3fcded8) > > Apr 11 19:25:42 node2 kernel: [<c01142ff>] restore_i387_fxsave > > [kernel] 0xaf (0xd3fcdee8) > > Apr 11 19:25:42 node2 kernel: [<c010b91f>] do_signal [kernel] 0x4f (0xd3fcdf1c) > > Apr 11 19:25:42 node2 kernel: [<c0109a68>] restore_sigcontext [kernel] > > 0x458 (0xd3fcdf28) > > Apr 11 19:25:42 node2 kernel: [<c0109ed6>] sys_sigreturn [kernel] > > 0x106 (0xd3fcdf90) > > Apr 11 19:25:42 node2 kernel: [<c010bb10>] signal_return [kernel] 0x14 > > (0xd3fcdfc0) > > Apr 11 19:25:42 node2 kernel: > > Apr 11 19:25:42 node2 kernel: Code: 0f 0b e8 00 ea db 38 c0 eb a5 8d > > b6 00 00 00 00 a1 c4 0f 5f > > Apr 11 19:27:17 node2 kernel: ------------[ cut here ]------------ > > Apr 11 19:27:17 node2 kernel: kernel BUG at shm.c:169! > > Apr 11 19:27:17 node2 kernel: invalid operand: 0000 > > Apr 11 19:27:17 node2 kernel: ipt_REJECT ipt_multiport ipt_state > > ip_conntrack ipt_TCPMSS iptable_filter ip_tables loop nfsd cls_u32 > > sch_sfq sch_htb tun microcode ide-cd sr_mod cdrom floppy > > Apr 11 19:27:17 node2 kernel: CPU: 0 > > Apr 11 19:27:17 node2 kernel: EIP: 0060:[<c01c9930>] Not tainted > > Apr 11 19:27:17 node2 kernel: EFLAGS: 00010246 > > Apr 11 19:27:17 node2 kernel: > > Apr 11 19:27:17 node2 kernel: EIP is at shm_open [kernel] 0x60 > > (2.4.22-1.2199.nptl_ssi_9smp) > > Apr 11 19:27:17 node2 kernel: eax: d8ad9000 ebx: d40e3c80 ecx: > > bf000000 edx: 00000000 > > Apr 11 19:27:17 node2 kernel: esi: d5d40600 edi: 00000000 ebp: > > d4955ec4 esp: d4955ec4 > > Apr 11 19:27:17 node2 kernel: ds: 0068 es: 0068 ss: 0068 > > Apr 11 19:27:17 node2 kernel: Process httpd (pid: 132813, stackpage=d4955000) > > Apr 11 19:27:17 node2 kernel: Call Trace: > > Apr 11 19:27:17 node2 kernel: [<c011f8a9>] copy_mm [kernel] 0x389 (0xd4955ec8) > > Apr 11 19:27:17 node2 kernel: [<c01201f9>] __copy_process [kernel] > > 0x399 (0xd4955f04) > > Apr 11 19:27:17 node2 kernel: [<c0120a22>] __do_fork [kernel] 0x52 (0xd4955f4c) > > Apr 11 19:27:17 node2 kernel: [<c0107e65>] sys_clone [kernel] 0x45 (0xd4955f9c) > > Apr 11 19:27:17 node2 kernel: [<c010bad7>] system_call [kernel] 0x33 > > (0xd4955fc0) > > Apr 11 19:27:17 node2 kernel: > > Apr 11 19:27:17 node2 kernel: Code: 0f 0b a9 00 ea db 38 c0 eb d2 8d > > b6 00 00 00 00 a1 c4 0f 5f > > > > > > ------------------------------------------------------- > > SF email is sponsored by - The IT Product Guide > > Read honest & candid reviews on hundreds of IT Products from real users. > > Discover which products truly live up to the hype. Start reading now. > > http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click > > _______________________________________________ > > ssic-linux-devel mailing list > > ssi...@li... > > https://lists.sourceforge.net/lists/listinfo/ssic-linux-devel > > > > > > > Index: ipc/shm.c > =================================================================== > RCS file: /cvsroot/ssic-linux/openssi/kernel/ipc/shm.c,v > retrieving revision 1.2.2.25 > diff -u -p -r1.2.2.25 shm.c > --- ipc/shm.c 17 Dec 2004 22:21:13 -0000 1.2.2.25 > +++ ipc/shm.c 12 Apr 2005 00:28:05 -0000 > @@ -1235,12 +1235,14 @@ ipc_shm_nodedown(clusternode_t node) > } > } > else { > + int id = shp->id; > + ipc_get_locks(id, &shm_ids, 1); > shp->shm_flags |= SHM_DEST; > if (shp->shm_nattch == 0) { > - ipc_get_locks(shp->id, &shm_ids, 1); > ssi_local_destroy(shp); > - up(&shm_ids.sem); > + id = 0; > } > + ipc_drop_locks(id, &shm_ids, 1); > } > } > } > > > |