Thread: [SSI-devel] [ ssic-linux-Bugs-1944781 ] unlocking unlocked lock in ripc_drop_locks
Brought to you by:
brucewalker,
rogertsang
From: SourceForge.net <no...@so...> - 2008-04-17 10:23:30
|
Bugs item #1944781, was opened at 2008-04-17 12:23 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=405834&aid=1944781&group_id=32541 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: IPC Group: v1.9.3 Status: Open Resolution: None Priority: 5 Private: No Submitted By: John Hughes (hughesj) Assigned to: Nobody/Anonymous (nobody) Summary: unlocking unlocked lock in ripc_drop_locks Initial Comment: ------------[ cut here ]------------ kernel BUG at include/asm/spinlock.h:112! invalid operand: 0000 [#1] SMP Modules linked in: i915 drm button ac battery parport_pc parport floppy pcspkr snd_intel8x0 snd_ac97_codec snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore snd_page_alloc i2c_i801 i2c_core ata_piix libata hw_random ehci_hcd uhci_hcd sr_mod sd_mod mptsas mptscsih mptbase scsi_mod tg3 e1000 CPU: 0 EIP: 0060:[<c046290b>] Not tainted VLI EFLAGS: 00010202 (2.6.11-jh-1) EIP is at _spin_unlock+0x1b/0x30 eax: 00000001 ebx: c0750140 ecx: c0750101 edx: f7e12e08 esi: f70c2400 edi: c0753360 ebp: f7032f10 esp: f7032f10 ds: 007b es: 007b ss: 0068 Process icssvr_daemon (pid: 197135, threadinfo=f7032000 task=f70cd930) Stack: f7032f18 c01cecbb f7032f28 c01ce77e f7e12e08 02668001 f7032f44 c0261dd5 02668001 f7e12e08 c0750140 00000001 f7032f5c f7032f6c c0258708 00000003 f7032f5c 02668001 00000000 00000000 02668001 00000002 00000002 f7032fec Call Trace: [<c010694f>] show_stack+0x7f/0xa0 [<c0106b04>] show_registers+0x164/0x220 [<c0106e94>] die+0xf4/0x1c0 [<c0107015>] do_trap+0xb5/0xc0 [<c01072cc>] do_invalid_op+0xbc/0xd0 [<c01065a3>] error_code+0x2b/0x30 [<c01cecbb>] ipc_unlock+0xb/0x10 [<c01ce77e>] ipc_drop_locks+0x1e/0x40 [<c0261dd5>] ripc_drop_locks+0x45/0x60 [<c0258708>] svr_ripc_drop_locks+0x58/0xb0 [<c020abb3>] icssvr_daemon+0x2f3/0xab0 [<c01023a5>] kernel_thread_helper+0x5/0x10 Code: 1c 0c 49 c0 eb e6 8d 76 00 8d bc 27 00 00 00 00 55 89 c2 89 e5 81 78 04 ad 4e ad de b1 01 75 15 0f b6 02 84 c0 7f 04 86 0a 5d c3 <0f> 0b 70 00 1c 0c 49 c0 eb f2 0f 0b 6f 00 1c 0c 49 c0 eb e1 90 ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=405834&aid=1944781&group_id=32541 |
From: SourceForge.net <no...@so...> - 2008-04-17 11:00:55
|
Bugs item #1944781, was opened at 2008-04-17 12:23 Message generated for change (Comment added) made by hughesj You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=405834&aid=1944781&group_id=32541 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: IPC Group: v1.9.3 Status: Open Resolution: None Priority: 5 Private: No Submitted By: John Hughes (hughesj) Assigned to: Nobody/Anonymous (nobody) Summary: unlocking unlocked lock in ripc_drop_locks Initial Comment: ------------[ cut here ]------------ kernel BUG at include/asm/spinlock.h:112! invalid operand: 0000 [#1] SMP Modules linked in: i915 drm button ac battery parport_pc parport floppy pcspkr snd_intel8x0 snd_ac97_codec snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore snd_page_alloc i2c_i801 i2c_core ata_piix libata hw_random ehci_hcd uhci_hcd sr_mod sd_mod mptsas mptscsih mptbase scsi_mod tg3 e1000 CPU: 0 EIP: 0060:[<c046290b>] Not tainted VLI EFLAGS: 00010202 (2.6.11-jh-1) EIP is at _spin_unlock+0x1b/0x30 eax: 00000001 ebx: c0750140 ecx: c0750101 edx: f7e12e08 esi: f70c2400 edi: c0753360 ebp: f7032f10 esp: f7032f10 ds: 007b es: 007b ss: 0068 Process icssvr_daemon (pid: 197135, threadinfo=f7032000 task=f70cd930) Stack: f7032f18 c01cecbb f7032f28 c01ce77e f7e12e08 02668001 f7032f44 c0261dd5 02668001 f7e12e08 c0750140 00000001 f7032f5c f7032f6c c0258708 00000003 f7032f5c 02668001 00000000 00000000 02668001 00000002 00000002 f7032fec Call Trace: [<c010694f>] show_stack+0x7f/0xa0 [<c0106b04>] show_registers+0x164/0x220 [<c0106e94>] die+0xf4/0x1c0 [<c0107015>] do_trap+0xb5/0xc0 [<c01072cc>] do_invalid_op+0xbc/0xd0 [<c01065a3>] error_code+0x2b/0x30 [<c01cecbb>] ipc_unlock+0xb/0x10 [<c01ce77e>] ipc_drop_locks+0x1e/0x40 [<c0261dd5>] ripc_drop_locks+0x45/0x60 [<c0258708>] svr_ripc_drop_locks+0x58/0xb0 [<c020abb3>] icssvr_daemon+0x2f3/0xab0 [<c01023a5>] kernel_thread_helper+0x5/0x10 Code: 1c 0c 49 c0 eb e6 8d 76 00 8d bc 27 00 00 00 00 55 89 c2 89 e5 81 78 04 ad 4e ad de b1 01 75 15 0f b6 02 84 c0 7f 04 86 0a 5d c3 <0f> 0b 70 00 1c 0c 49 c0 eb f2 0f 0b 6f 00 1c 0c 49 c0 eb e1 90 ---------------------------------------------------------------------- >Comment By: John Hughes (hughesj) Date: 2008-04-17 13:00 Message: Logged In: YES user_id=166336 Originator: YES Frankly I can't see how this is supposed to work. In abort_rmid (cluster/ssi/ipc/ipcshm_svr.c around line 418) we have: nl = NSC_NODELIST_COPY(svp->shm_nodelist); [...] while ((node = NSC_NODELIST_GET_NEXT(&cookie, nl)) != CLUSTERNODE_INVAL) { if (node == this_node) continue; ret = RIPC_DROP_LOCKS(node, &rval, id); } Which will call ipc_drop_locks on the remote node, but who has the lock? And why? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=405834&aid=1944781&group_id=32541 |
From: SourceForge.net <no...@so...> - 2008-04-17 11:19:22
|
Bugs item #1944781, was opened at 2008-04-17 12:23 Message generated for change (Comment added) made by hughesj You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=405834&aid=1944781&group_id=32541 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: IPC Group: v1.9.3 Status: Open Resolution: None Priority: 5 Private: No Submitted By: John Hughes (hughesj) Assigned to: Nobody/Anonymous (nobody) Summary: unlocking unlocked lock in ripc_drop_locks Initial Comment: ------------[ cut here ]------------ kernel BUG at include/asm/spinlock.h:112! invalid operand: 0000 [#1] SMP Modules linked in: i915 drm button ac battery parport_pc parport floppy pcspkr snd_intel8x0 snd_ac97_codec snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore snd_page_alloc i2c_i801 i2c_core ata_piix libata hw_random ehci_hcd uhci_hcd sr_mod sd_mod mptsas mptscsih mptbase scsi_mod tg3 e1000 CPU: 0 EIP: 0060:[<c046290b>] Not tainted VLI EFLAGS: 00010202 (2.6.11-jh-1) EIP is at _spin_unlock+0x1b/0x30 eax: 00000001 ebx: c0750140 ecx: c0750101 edx: f7e12e08 esi: f70c2400 edi: c0753360 ebp: f7032f10 esp: f7032f10 ds: 007b es: 007b ss: 0068 Process icssvr_daemon (pid: 197135, threadinfo=f7032000 task=f70cd930) Stack: f7032f18 c01cecbb f7032f28 c01ce77e f7e12e08 02668001 f7032f44 c0261dd5 02668001 f7e12e08 c0750140 00000001 f7032f5c f7032f6c c0258708 00000003 f7032f5c 02668001 00000000 00000000 02668001 00000002 00000002 f7032fec Call Trace: [<c010694f>] show_stack+0x7f/0xa0 [<c0106b04>] show_registers+0x164/0x220 [<c0106e94>] die+0xf4/0x1c0 [<c0107015>] do_trap+0xb5/0xc0 [<c01072cc>] do_invalid_op+0xbc/0xd0 [<c01065a3>] error_code+0x2b/0x30 [<c01cecbb>] ipc_unlock+0xb/0x10 [<c01ce77e>] ipc_drop_locks+0x1e/0x40 [<c0261dd5>] ripc_drop_locks+0x45/0x60 [<c0258708>] svr_ripc_drop_locks+0x58/0xb0 [<c020abb3>] icssvr_daemon+0x2f3/0xab0 [<c01023a5>] kernel_thread_helper+0x5/0x10 Code: 1c 0c 49 c0 eb e6 8d 76 00 8d bc 27 00 00 00 00 55 89 c2 89 e5 81 78 04 ad 4e ad de b1 01 75 15 0f b6 02 84 c0 7f 04 86 0a 5d c3 <0f> 0b 70 00 1c 0c 49 c0 eb f2 0f 0b 6f 00 1c 0c 49 c0 eb e1 90 ---------------------------------------------------------------------- >Comment By: John Hughes (hughesj) Date: 2008-04-17 13:18 Message: Logged In: YES user_id=166336 Originator: YES Well,turned out to be pretty easy to duplicate. Wrote a silly test program to create, attach and destroy a segment. Node 1 node 2 shmget smhmat shmget shmat shmctl (IPC_RMID) BUG! ---------------------------------------------------------------------- Comment By: John Hughes (hughesj) Date: 2008-04-17 13:00 Message: Logged In: YES user_id=166336 Originator: YES Frankly I can't see how this is supposed to work. In abort_rmid (cluster/ssi/ipc/ipcshm_svr.c around line 418) we have: nl = NSC_NODELIST_COPY(svp->shm_nodelist); [...] while ((node = NSC_NODELIST_GET_NEXT(&cookie, nl)) != CLUSTERNODE_INVAL) { if (node == this_node) continue; ret = RIPC_DROP_LOCKS(node, &rval, id); } Which will call ipc_drop_locks on the remote node, but who has the lock? And why? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=405834&aid=1944781&group_id=32541 |
From: SourceForge.net <no...@so...> - 2008-04-18 16:27:53
|
Bugs item #1944781, was opened at 2008-04-17 12:23 Message generated for change (Comment added) made by hughesj You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=405834&aid=1944781&group_id=32541 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: IPC Group: v1.9.3 Status: Open Resolution: None Priority: 5 Private: No Submitted By: John Hughes (hughesj) Assigned to: Nobody/Anonymous (nobody) Summary: unlocking unlocked lock in ripc_drop_locks Initial Comment: ------------[ cut here ]------------ kernel BUG at include/asm/spinlock.h:112! invalid operand: 0000 [#1] SMP Modules linked in: i915 drm button ac battery parport_pc parport floppy pcspkr snd_intel8x0 snd_ac97_codec snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore snd_page_alloc i2c_i801 i2c_core ata_piix libata hw_random ehci_hcd uhci_hcd sr_mod sd_mod mptsas mptscsih mptbase scsi_mod tg3 e1000 CPU: 0 EIP: 0060:[<c046290b>] Not tainted VLI EFLAGS: 00010202 (2.6.11-jh-1) EIP is at _spin_unlock+0x1b/0x30 eax: 00000001 ebx: c0750140 ecx: c0750101 edx: f7e12e08 esi: f70c2400 edi: c0753360 ebp: f7032f10 esp: f7032f10 ds: 007b es: 007b ss: 0068 Process icssvr_daemon (pid: 197135, threadinfo=f7032000 task=f70cd930) Stack: f7032f18 c01cecbb f7032f28 c01ce77e f7e12e08 02668001 f7032f44 c0261dd5 02668001 f7e12e08 c0750140 00000001 f7032f5c f7032f6c c0258708 00000003 f7032f5c 02668001 00000000 00000000 02668001 00000002 00000002 f7032fec Call Trace: [<c010694f>] show_stack+0x7f/0xa0 [<c0106b04>] show_registers+0x164/0x220 [<c0106e94>] die+0xf4/0x1c0 [<c0107015>] do_trap+0xb5/0xc0 [<c01072cc>] do_invalid_op+0xbc/0xd0 [<c01065a3>] error_code+0x2b/0x30 [<c01cecbb>] ipc_unlock+0xb/0x10 [<c01ce77e>] ipc_drop_locks+0x1e/0x40 [<c0261dd5>] ripc_drop_locks+0x45/0x60 [<c0258708>] svr_ripc_drop_locks+0x58/0xb0 [<c020abb3>] icssvr_daemon+0x2f3/0xab0 [<c01023a5>] kernel_thread_helper+0x5/0x10 Code: 1c 0c 49 c0 eb e6 8d 76 00 8d bc 27 00 00 00 00 55 89 c2 89 e5 81 78 04 ad 4e ad de b1 01 75 15 0f b6 02 84 c0 7f 04 86 0a 5d c3 <0f> 0b 70 00 1c 0c 49 c0 eb f2 0f 0b 6f 00 1c 0c 49 c0 eb e1 90 ---------------------------------------------------------------------- >Comment By: John Hughes (hughesj) Date: 2008-04-18 18:27 Message: Logged In: YES user_id=166336 Originator: YES Well, a simple patch seems to be set the "id" to zero in the call to ipc_drop_locks in ripc_drop_locks. That means we continue to call up(&ids->sem) (which we need) but not ipc_unlock (which crashes us immediately). Patch attached, File Added: ripc_drop_locks.patch ---------------------------------------------------------------------- Comment By: John Hughes (hughesj) Date: 2008-04-17 13:18 Message: Logged In: YES user_id=166336 Originator: YES Well,turned out to be pretty easy to duplicate. Wrote a silly test program to create, attach and destroy a segment. Node 1 node 2 shmget smhmat shmget shmat shmctl (IPC_RMID) BUG! ---------------------------------------------------------------------- Comment By: John Hughes (hughesj) Date: 2008-04-17 13:00 Message: Logged In: YES user_id=166336 Originator: YES Frankly I can't see how this is supposed to work. In abort_rmid (cluster/ssi/ipc/ipcshm_svr.c around line 418) we have: nl = NSC_NODELIST_COPY(svp->shm_nodelist); [...] while ((node = NSC_NODELIST_GET_NEXT(&cookie, nl)) != CLUSTERNODE_INVAL) { if (node == this_node) continue; ret = RIPC_DROP_LOCKS(node, &rval, id); } Which will call ipc_drop_locks on the remote node, but who has the lock? And why? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=405834&aid=1944781&group_id=32541 |
From: SourceForge.net <no...@so...> - 2008-04-19 07:13:52
|
Bugs item #1944781, was opened at 2008-04-17 06:23 Message generated for change (Comment added) made by rogertsang You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=405834&aid=1944781&group_id=32541 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: IPC Group: v1.9.3 Status: Open Resolution: None Priority: 5 Private: No Submitted By: John Hughes (hughesj) Assigned to: Nobody/Anonymous (nobody) Summary: unlocking unlocked lock in ripc_drop_locks Initial Comment: ------------[ cut here ]------------ kernel BUG at include/asm/spinlock.h:112! invalid operand: 0000 [#1] SMP Modules linked in: i915 drm button ac battery parport_pc parport floppy pcspkr snd_intel8x0 snd_ac97_codec snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore snd_page_alloc i2c_i801 i2c_core ata_piix libata hw_random ehci_hcd uhci_hcd sr_mod sd_mod mptsas mptscsih mptbase scsi_mod tg3 e1000 CPU: 0 EIP: 0060:[<c046290b>] Not tainted VLI EFLAGS: 00010202 (2.6.11-jh-1) EIP is at _spin_unlock+0x1b/0x30 eax: 00000001 ebx: c0750140 ecx: c0750101 edx: f7e12e08 esi: f70c2400 edi: c0753360 ebp: f7032f10 esp: f7032f10 ds: 007b es: 007b ss: 0068 Process icssvr_daemon (pid: 197135, threadinfo=f7032000 task=f70cd930) Stack: f7032f18 c01cecbb f7032f28 c01ce77e f7e12e08 02668001 f7032f44 c0261dd5 02668001 f7e12e08 c0750140 00000001 f7032f5c f7032f6c c0258708 00000003 f7032f5c 02668001 00000000 00000000 02668001 00000002 00000002 f7032fec Call Trace: [<c010694f>] show_stack+0x7f/0xa0 [<c0106b04>] show_registers+0x164/0x220 [<c0106e94>] die+0xf4/0x1c0 [<c0107015>] do_trap+0xb5/0xc0 [<c01072cc>] do_invalid_op+0xbc/0xd0 [<c01065a3>] error_code+0x2b/0x30 [<c01cecbb>] ipc_unlock+0xb/0x10 [<c01ce77e>] ipc_drop_locks+0x1e/0x40 [<c0261dd5>] ripc_drop_locks+0x45/0x60 [<c0258708>] svr_ripc_drop_locks+0x58/0xb0 [<c020abb3>] icssvr_daemon+0x2f3/0xab0 [<c01023a5>] kernel_thread_helper+0x5/0x10 Code: 1c 0c 49 c0 eb e6 8d 76 00 8d bc 27 00 00 00 00 55 89 c2 89 e5 81 78 04 ad 4e ad de b1 01 75 15 0f b6 02 84 c0 7f 04 86 0a 5d c3 <0f> 0b 70 00 1c 0c 49 c0 eb f2 0f 0b 6f 00 1c 0c 49 c0 eb e1 90 ---------------------------------------------------------------------- >Comment By: Roger Tsang (rogertsang) Date: 2008-04-19 03:13 Message: Logged In: YES user_id=1246761 Originator: NO Race at shm_destroy() going to the wrong svrnode. Try this patch. File Added: ipc_shm.c.patch ---------------------------------------------------------------------- Comment By: John Hughes (hughesj) Date: 2008-04-18 12:27 Message: Logged In: YES user_id=166336 Originator: YES Well, a simple patch seems to be set the "id" to zero in the call to ipc_drop_locks in ripc_drop_locks. That means we continue to call up(&ids->sem) (which we need) but not ipc_unlock (which crashes us immediately). Patch attached, File Added: ripc_drop_locks.patch ---------------------------------------------------------------------- Comment By: John Hughes (hughesj) Date: 2008-04-17 07:18 Message: Logged In: YES user_id=166336 Originator: YES Well,turned out to be pretty easy to duplicate. Wrote a silly test program to create, attach and destroy a segment. Node 1 node 2 shmget smhmat shmget shmat shmctl (IPC_RMID) BUG! ---------------------------------------------------------------------- Comment By: John Hughes (hughesj) Date: 2008-04-17 07:00 Message: Logged In: YES user_id=166336 Originator: YES Frankly I can't see how this is supposed to work. In abort_rmid (cluster/ssi/ipc/ipcshm_svr.c around line 418) we have: nl = NSC_NODELIST_COPY(svp->shm_nodelist); [...] while ((node = NSC_NODELIST_GET_NEXT(&cookie, nl)) != CLUSTERNODE_INVAL) { if (node == this_node) continue; ret = RIPC_DROP_LOCKS(node, &rval, id); } Which will call ipc_drop_locks on the remote node, but who has the lock? And why? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=405834&aid=1944781&group_id=32541 |
From: SourceForge.net <no...@so...> - 2008-04-19 19:25:47
|
Bugs item #1944781, was opened at 2008-04-17 06:23 Message generated for change (Comment added) made by rogertsang You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=405834&aid=1944781&group_id=32541 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: IPC Group: v1.9.3 Status: Open Resolution: None Priority: 5 Private: No Submitted By: John Hughes (hughesj) Assigned to: Nobody/Anonymous (nobody) Summary: unlocking unlocked lock in ripc_drop_locks Initial Comment: ------------[ cut here ]------------ kernel BUG at include/asm/spinlock.h:112! invalid operand: 0000 [#1] SMP Modules linked in: i915 drm button ac battery parport_pc parport floppy pcspkr snd_intel8x0 snd_ac97_codec snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore snd_page_alloc i2c_i801 i2c_core ata_piix libata hw_random ehci_hcd uhci_hcd sr_mod sd_mod mptsas mptscsih mptbase scsi_mod tg3 e1000 CPU: 0 EIP: 0060:[<c046290b>] Not tainted VLI EFLAGS: 00010202 (2.6.11-jh-1) EIP is at _spin_unlock+0x1b/0x30 eax: 00000001 ebx: c0750140 ecx: c0750101 edx: f7e12e08 esi: f70c2400 edi: c0753360 ebp: f7032f10 esp: f7032f10 ds: 007b es: 007b ss: 0068 Process icssvr_daemon (pid: 197135, threadinfo=f7032000 task=f70cd930) Stack: f7032f18 c01cecbb f7032f28 c01ce77e f7e12e08 02668001 f7032f44 c0261dd5 02668001 f7e12e08 c0750140 00000001 f7032f5c f7032f6c c0258708 00000003 f7032f5c 02668001 00000000 00000000 02668001 00000002 00000002 f7032fec Call Trace: [<c010694f>] show_stack+0x7f/0xa0 [<c0106b04>] show_registers+0x164/0x220 [<c0106e94>] die+0xf4/0x1c0 [<c0107015>] do_trap+0xb5/0xc0 [<c01072cc>] do_invalid_op+0xbc/0xd0 [<c01065a3>] error_code+0x2b/0x30 [<c01cecbb>] ipc_unlock+0xb/0x10 [<c01ce77e>] ipc_drop_locks+0x1e/0x40 [<c0261dd5>] ripc_drop_locks+0x45/0x60 [<c0258708>] svr_ripc_drop_locks+0x58/0xb0 [<c020abb3>] icssvr_daemon+0x2f3/0xab0 [<c01023a5>] kernel_thread_helper+0x5/0x10 Code: 1c 0c 49 c0 eb e6 8d 76 00 8d bc 27 00 00 00 00 55 89 c2 89 e5 81 78 04 ad 4e ad de b1 01 75 15 0f b6 02 84 c0 7f 04 86 0a 5d c3 <0f> 0b 70 00 1c 0c 49 c0 eb f2 0f 0b 6f 00 1c 0c 49 c0 eb e1 90 ---------------------------------------------------------------------- >Comment By: Roger Tsang (rogertsang) Date: 2008-04-19 15:25 Message: Logged In: YES user_id=1246761 Originator: NO Updated patch for sys_shmget() client did not register with server. File Added: ipc_shm.c.patch ---------------------------------------------------------------------- Comment By: Roger Tsang (rogertsang) Date: 2008-04-19 03:13 Message: Logged In: YES user_id=1246761 Originator: NO Race at shm_destroy() going to the wrong svrnode. Try this patch. File Added: ipc_shm.c.patch ---------------------------------------------------------------------- Comment By: John Hughes (hughesj) Date: 2008-04-18 12:27 Message: Logged In: YES user_id=166336 Originator: YES Well, a simple patch seems to be set the "id" to zero in the call to ipc_drop_locks in ripc_drop_locks. That means we continue to call up(&ids->sem) (which we need) but not ipc_unlock (which crashes us immediately). Patch attached, File Added: ripc_drop_locks.patch ---------------------------------------------------------------------- Comment By: John Hughes (hughesj) Date: 2008-04-17 07:18 Message: Logged In: YES user_id=166336 Originator: YES Well,turned out to be pretty easy to duplicate. Wrote a silly test program to create, attach and destroy a segment. Node 1 node 2 shmget smhmat shmget shmat shmctl (IPC_RMID) BUG! ---------------------------------------------------------------------- Comment By: John Hughes (hughesj) Date: 2008-04-17 07:00 Message: Logged In: YES user_id=166336 Originator: YES Frankly I can't see how this is supposed to work. In abort_rmid (cluster/ssi/ipc/ipcshm_svr.c around line 418) we have: nl = NSC_NODELIST_COPY(svp->shm_nodelist); [...] while ((node = NSC_NODELIST_GET_NEXT(&cookie, nl)) != CLUSTERNODE_INVAL) { if (node == this_node) continue; ret = RIPC_DROP_LOCKS(node, &rval, id); } Which will call ipc_drop_locks on the remote node, but who has the lock? And why? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=405834&aid=1944781&group_id=32541 |
From: SourceForge.net <no...@so...> - 2008-04-22 12:12:09
|
Bugs item #1944781, was opened at 2008-04-17 12:23 Message generated for change (Comment added) made by hughesj You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=405834&aid=1944781&group_id=32541 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: IPC Group: v1.9.3 Status: Open Resolution: None Priority: 5 Private: No Submitted By: John Hughes (hughesj) Assigned to: Nobody/Anonymous (nobody) Summary: unlocking unlocked lock in ripc_drop_locks Initial Comment: ------------[ cut here ]------------ kernel BUG at include/asm/spinlock.h:112! invalid operand: 0000 [#1] SMP Modules linked in: i915 drm button ac battery parport_pc parport floppy pcspkr snd_intel8x0 snd_ac97_codec snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore snd_page_alloc i2c_i801 i2c_core ata_piix libata hw_random ehci_hcd uhci_hcd sr_mod sd_mod mptsas mptscsih mptbase scsi_mod tg3 e1000 CPU: 0 EIP: 0060:[<c046290b>] Not tainted VLI EFLAGS: 00010202 (2.6.11-jh-1) EIP is at _spin_unlock+0x1b/0x30 eax: 00000001 ebx: c0750140 ecx: c0750101 edx: f7e12e08 esi: f70c2400 edi: c0753360 ebp: f7032f10 esp: f7032f10 ds: 007b es: 007b ss: 0068 Process icssvr_daemon (pid: 197135, threadinfo=f7032000 task=f70cd930) Stack: f7032f18 c01cecbb f7032f28 c01ce77e f7e12e08 02668001 f7032f44 c0261dd5 02668001 f7e12e08 c0750140 00000001 f7032f5c f7032f6c c0258708 00000003 f7032f5c 02668001 00000000 00000000 02668001 00000002 00000002 f7032fec Call Trace: [<c010694f>] show_stack+0x7f/0xa0 [<c0106b04>] show_registers+0x164/0x220 [<c0106e94>] die+0xf4/0x1c0 [<c0107015>] do_trap+0xb5/0xc0 [<c01072cc>] do_invalid_op+0xbc/0xd0 [<c01065a3>] error_code+0x2b/0x30 [<c01cecbb>] ipc_unlock+0xb/0x10 [<c01ce77e>] ipc_drop_locks+0x1e/0x40 [<c0261dd5>] ripc_drop_locks+0x45/0x60 [<c0258708>] svr_ripc_drop_locks+0x58/0xb0 [<c020abb3>] icssvr_daemon+0x2f3/0xab0 [<c01023a5>] kernel_thread_helper+0x5/0x10 Code: 1c 0c 49 c0 eb e6 8d 76 00 8d bc 27 00 00 00 00 55 89 c2 89 e5 81 78 04 ad 4e ad de b1 01 75 15 0f b6 02 84 c0 7f 04 86 0a 5d c3 <0f> 0b 70 00 1c 0c 49 c0 eb f2 0f 0b 6f 00 1c 0c 49 c0 eb e1 90 ---------------------------------------------------------------------- >Comment By: John Hughes (hughesj) Date: 2008-04-22 14:12 Message: Logged In: YES user_id=166336 Originator: YES I've just tested with current CVS (2008-4-22) which, as far as I can tell includes the ipc_shm.c.patch, IPC_SHM_DESTROY_FIX is defined in linux.h and I get the same crash. To duplicate the problem, on a two node system: onnode 1 ./shmcrash 123 100 & sleep 1 onnode 2 ./shmcrash 123 120 Node 2 will crash in ipc_unlock when the shmcrash process on node 1 destroys the shared memory segment. Source for shmcrash program attached. File Added: shmcrash.c ---------------------------------------------------------------------- Comment By: Roger Tsang (rogertsang) Date: 2008-04-19 21:25 Message: Logged In: YES user_id=1246761 Originator: NO Updated patch for sys_shmget() client did not register with server. File Added: ipc_shm.c.patch ---------------------------------------------------------------------- Comment By: Roger Tsang (rogertsang) Date: 2008-04-19 09:13 Message: Logged In: YES user_id=1246761 Originator: NO Race at shm_destroy() going to the wrong svrnode. Try this patch. File Added: ipc_shm.c.patch ---------------------------------------------------------------------- Comment By: John Hughes (hughesj) Date: 2008-04-18 18:27 Message: Logged In: YES user_id=166336 Originator: YES Well, a simple patch seems to be set the "id" to zero in the call to ipc_drop_locks in ripc_drop_locks. That means we continue to call up(&ids->sem) (which we need) but not ipc_unlock (which crashes us immediately). Patch attached, File Added: ripc_drop_locks.patch ---------------------------------------------------------------------- Comment By: John Hughes (hughesj) Date: 2008-04-17 13:18 Message: Logged In: YES user_id=166336 Originator: YES Well,turned out to be pretty easy to duplicate. Wrote a silly test program to create, attach and destroy a segment. Node 1 node 2 shmget smhmat shmget shmat shmctl (IPC_RMID) BUG! ---------------------------------------------------------------------- Comment By: John Hughes (hughesj) Date: 2008-04-17 13:00 Message: Logged In: YES user_id=166336 Originator: YES Frankly I can't see how this is supposed to work. In abort_rmid (cluster/ssi/ipc/ipcshm_svr.c around line 418) we have: nl = NSC_NODELIST_COPY(svp->shm_nodelist); [...] while ((node = NSC_NODELIST_GET_NEXT(&cookie, nl)) != CLUSTERNODE_INVAL) { if (node == this_node) continue; ret = RIPC_DROP_LOCKS(node, &rval, id); } Which will call ipc_drop_locks on the remote node, but who has the lock? And why? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=405834&aid=1944781&group_id=32541 |
From: SourceForge.net <no...@so...> - 2008-04-23 07:16:05
|
Bugs item #1944781, was opened at 2008-04-17 06:23 Message generated for change (Comment added) made by rogertsang You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=405834&aid=1944781&group_id=32541 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: IPC Group: v1.9.3 Status: Open Resolution: None Priority: 5 Private: No Submitted By: John Hughes (hughesj) Assigned to: Nobody/Anonymous (nobody) Summary: unlocking unlocked lock in ripc_drop_locks Initial Comment: ------------[ cut here ]------------ kernel BUG at include/asm/spinlock.h:112! invalid operand: 0000 [#1] SMP Modules linked in: i915 drm button ac battery parport_pc parport floppy pcspkr snd_intel8x0 snd_ac97_codec snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore snd_page_alloc i2c_i801 i2c_core ata_piix libata hw_random ehci_hcd uhci_hcd sr_mod sd_mod mptsas mptscsih mptbase scsi_mod tg3 e1000 CPU: 0 EIP: 0060:[<c046290b>] Not tainted VLI EFLAGS: 00010202 (2.6.11-jh-1) EIP is at _spin_unlock+0x1b/0x30 eax: 00000001 ebx: c0750140 ecx: c0750101 edx: f7e12e08 esi: f70c2400 edi: c0753360 ebp: f7032f10 esp: f7032f10 ds: 007b es: 007b ss: 0068 Process icssvr_daemon (pid: 197135, threadinfo=f7032000 task=f70cd930) Stack: f7032f18 c01cecbb f7032f28 c01ce77e f7e12e08 02668001 f7032f44 c0261dd5 02668001 f7e12e08 c0750140 00000001 f7032f5c f7032f6c c0258708 00000003 f7032f5c 02668001 00000000 00000000 02668001 00000002 00000002 f7032fec Call Trace: [<c010694f>] show_stack+0x7f/0xa0 [<c0106b04>] show_registers+0x164/0x220 [<c0106e94>] die+0xf4/0x1c0 [<c0107015>] do_trap+0xb5/0xc0 [<c01072cc>] do_invalid_op+0xbc/0xd0 [<c01065a3>] error_code+0x2b/0x30 [<c01cecbb>] ipc_unlock+0xb/0x10 [<c01ce77e>] ipc_drop_locks+0x1e/0x40 [<c0261dd5>] ripc_drop_locks+0x45/0x60 [<c0258708>] svr_ripc_drop_locks+0x58/0xb0 [<c020abb3>] icssvr_daemon+0x2f3/0xab0 [<c01023a5>] kernel_thread_helper+0x5/0x10 Code: 1c 0c 49 c0 eb e6 8d 76 00 8d bc 27 00 00 00 00 55 89 c2 89 e5 81 78 04 ad 4e ad de b1 01 75 15 0f b6 02 84 c0 7f 04 86 0a 5d c3 <0f> 0b 70 00 1c 0c 49 c0 eb f2 0f 0b 6f 00 1c 0c 49 c0 eb e1 90 ---------------------------------------------------------------------- >Comment By: Roger Tsang (rogertsang) Date: 2008-04-23 03:16 Message: Logged In: YES user_id=1246761 Originator: NO Taking a closer look at shm_destroy() and looks like ssi_shm_cleanup() did not handle -ERMID. Kinda late night, so haven't tested either your shmcrash or this patch but it makes sense. File Added: IPC_SHM_DESTROY_FIX.patch ---------------------------------------------------------------------- Comment By: John Hughes (hughesj) Date: 2008-04-22 08:12 Message: Logged In: YES user_id=166336 Originator: YES I've just tested with current CVS (2008-4-22) which, as far as I can tell includes the ipc_shm.c.patch, IPC_SHM_DESTROY_FIX is defined in linux.h and I get the same crash. To duplicate the problem, on a two node system: onnode 1 ./shmcrash 123 100 & sleep 1 onnode 2 ./shmcrash 123 120 Node 2 will crash in ipc_unlock when the shmcrash process on node 1 destroys the shared memory segment. Source for shmcrash program attached. File Added: shmcrash.c ---------------------------------------------------------------------- Comment By: Roger Tsang (rogertsang) Date: 2008-04-19 15:25 Message: Logged In: YES user_id=1246761 Originator: NO Updated patch for sys_shmget() client did not register with server. File Added: ipc_shm.c.patch ---------------------------------------------------------------------- Comment By: Roger Tsang (rogertsang) Date: 2008-04-19 03:13 Message: Logged In: YES user_id=1246761 Originator: NO Race at shm_destroy() going to the wrong svrnode. Try this patch. File Added: ipc_shm.c.patch ---------------------------------------------------------------------- Comment By: John Hughes (hughesj) Date: 2008-04-18 12:27 Message: Logged In: YES user_id=166336 Originator: YES Well, a simple patch seems to be set the "id" to zero in the call to ipc_drop_locks in ripc_drop_locks. That means we continue to call up(&ids->sem) (which we need) but not ipc_unlock (which crashes us immediately). Patch attached, File Added: ripc_drop_locks.patch ---------------------------------------------------------------------- Comment By: John Hughes (hughesj) Date: 2008-04-17 07:18 Message: Logged In: YES user_id=166336 Originator: YES Well,turned out to be pretty easy to duplicate. Wrote a silly test program to create, attach and destroy a segment. Node 1 node 2 shmget smhmat shmget shmat shmctl (IPC_RMID) BUG! ---------------------------------------------------------------------- Comment By: John Hughes (hughesj) Date: 2008-04-17 07:00 Message: Logged In: YES user_id=166336 Originator: YES Frankly I can't see how this is supposed to work. In abort_rmid (cluster/ssi/ipc/ipcshm_svr.c around line 418) we have: nl = NSC_NODELIST_COPY(svp->shm_nodelist); [...] while ((node = NSC_NODELIST_GET_NEXT(&cookie, nl)) != CLUSTERNODE_INVAL) { if (node == this_node) continue; ret = RIPC_DROP_LOCKS(node, &rval, id); } Which will call ipc_drop_locks on the remote node, but who has the lock? And why? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=405834&aid=1944781&group_id=32541 |
From: SourceForge.net <no...@so...> - 2008-05-23 23:44:17
|
Bugs item #1944781, was opened at 2008-04-17 06:23 Message generated for change (Settings changed) made by rogertsang You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=405834&aid=1944781&group_id=32541 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: IPC Group: v1.9.3 Status: Open >Resolution: Accepted Priority: 5 Private: No Submitted By: John Hughes (hughesj) Assigned to: Nobody/Anonymous (nobody) Summary: unlocking unlocked lock in ripc_drop_locks Initial Comment: ------------[ cut here ]------------ kernel BUG at include/asm/spinlock.h:112! invalid operand: 0000 [#1] SMP Modules linked in: i915 drm button ac battery parport_pc parport floppy pcspkr snd_intel8x0 snd_ac97_codec snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore snd_page_alloc i2c_i801 i2c_core ata_piix libata hw_random ehci_hcd uhci_hcd sr_mod sd_mod mptsas mptscsih mptbase scsi_mod tg3 e1000 CPU: 0 EIP: 0060:[<c046290b>] Not tainted VLI EFLAGS: 00010202 (2.6.11-jh-1) EIP is at _spin_unlock+0x1b/0x30 eax: 00000001 ebx: c0750140 ecx: c0750101 edx: f7e12e08 esi: f70c2400 edi: c0753360 ebp: f7032f10 esp: f7032f10 ds: 007b es: 007b ss: 0068 Process icssvr_daemon (pid: 197135, threadinfo=f7032000 task=f70cd930) Stack: f7032f18 c01cecbb f7032f28 c01ce77e f7e12e08 02668001 f7032f44 c0261dd5 02668001 f7e12e08 c0750140 00000001 f7032f5c f7032f6c c0258708 00000003 f7032f5c 02668001 00000000 00000000 02668001 00000002 00000002 f7032fec Call Trace: [<c010694f>] show_stack+0x7f/0xa0 [<c0106b04>] show_registers+0x164/0x220 [<c0106e94>] die+0xf4/0x1c0 [<c0107015>] do_trap+0xb5/0xc0 [<c01072cc>] do_invalid_op+0xbc/0xd0 [<c01065a3>] error_code+0x2b/0x30 [<c01cecbb>] ipc_unlock+0xb/0x10 [<c01ce77e>] ipc_drop_locks+0x1e/0x40 [<c0261dd5>] ripc_drop_locks+0x45/0x60 [<c0258708>] svr_ripc_drop_locks+0x58/0xb0 [<c020abb3>] icssvr_daemon+0x2f3/0xab0 [<c01023a5>] kernel_thread_helper+0x5/0x10 Code: 1c 0c 49 c0 eb e6 8d 76 00 8d bc 27 00 00 00 00 55 89 c2 89 e5 81 78 04 ad 4e ad de b1 01 75 15 0f b6 02 84 c0 7f 04 86 0a 5d c3 <0f> 0b 70 00 1c 0c 49 c0 eb f2 0f 0b 6f 00 1c 0c 49 c0 eb e1 90 ---------------------------------------------------------------------- Comment By: Roger Tsang (rogertsang) Date: 2008-04-23 03:16 Message: Logged In: YES user_id=1246761 Originator: NO Taking a closer look at shm_destroy() and looks like ssi_shm_cleanup() did not handle -ERMID. Kinda late night, so haven't tested either your shmcrash or this patch but it makes sense. File Added: IPC_SHM_DESTROY_FIX.patch ---------------------------------------------------------------------- Comment By: John Hughes (hughesj) Date: 2008-04-22 08:12 Message: Logged In: YES user_id=166336 Originator: YES I've just tested with current CVS (2008-4-22) which, as far as I can tell includes the ipc_shm.c.patch, IPC_SHM_DESTROY_FIX is defined in linux.h and I get the same crash. To duplicate the problem, on a two node system: onnode 1 ./shmcrash 123 100 & sleep 1 onnode 2 ./shmcrash 123 120 Node 2 will crash in ipc_unlock when the shmcrash process on node 1 destroys the shared memory segment. Source for shmcrash program attached. File Added: shmcrash.c ---------------------------------------------------------------------- Comment By: Roger Tsang (rogertsang) Date: 2008-04-19 15:25 Message: Logged In: YES user_id=1246761 Originator: NO Updated patch for sys_shmget() client did not register with server. File Added: ipc_shm.c.patch ---------------------------------------------------------------------- Comment By: Roger Tsang (rogertsang) Date: 2008-04-19 03:13 Message: Logged In: YES user_id=1246761 Originator: NO Race at shm_destroy() going to the wrong svrnode. Try this patch. File Added: ipc_shm.c.patch ---------------------------------------------------------------------- Comment By: John Hughes (hughesj) Date: 2008-04-18 12:27 Message: Logged In: YES user_id=166336 Originator: YES Well, a simple patch seems to be set the "id" to zero in the call to ipc_drop_locks in ripc_drop_locks. That means we continue to call up(&ids->sem) (which we need) but not ipc_unlock (which crashes us immediately). Patch attached, File Added: ripc_drop_locks.patch ---------------------------------------------------------------------- Comment By: John Hughes (hughesj) Date: 2008-04-17 07:18 Message: Logged In: YES user_id=166336 Originator: YES Well,turned out to be pretty easy to duplicate. Wrote a silly test program to create, attach and destroy a segment. Node 1 node 2 shmget smhmat shmget shmat shmctl (IPC_RMID) BUG! ---------------------------------------------------------------------- Comment By: John Hughes (hughesj) Date: 2008-04-17 07:00 Message: Logged In: YES user_id=166336 Originator: YES Frankly I can't see how this is supposed to work. In abort_rmid (cluster/ssi/ipc/ipcshm_svr.c around line 418) we have: nl = NSC_NODELIST_COPY(svp->shm_nodelist); [...] while ((node = NSC_NODELIST_GET_NEXT(&cookie, nl)) != CLUSTERNODE_INVAL) { if (node == this_node) continue; ret = RIPC_DROP_LOCKS(node, &rval, id); } Which will call ipc_drop_locks on the remote node, but who has the lock? And why? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=405834&aid=1944781&group_id=32541 |
From: SourceForge.net <no...@so...> - 2008-05-24 00:21:13
|
Bugs item #1944781, was opened at 2008-04-17 06:23 Message generated for change (Comment added) made by rogertsang You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=405834&aid=1944781&group_id=32541 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: IPC Group: v1.9.3 >Status: Closed >Resolution: Fixed Priority: 5 Private: No Submitted By: John Hughes (hughesj) Assigned to: Nobody/Anonymous (nobody) Summary: unlocking unlocked lock in ripc_drop_locks Initial Comment: ------------[ cut here ]------------ kernel BUG at include/asm/spinlock.h:112! invalid operand: 0000 [#1] SMP Modules linked in: i915 drm button ac battery parport_pc parport floppy pcspkr snd_intel8x0 snd_ac97_codec snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore snd_page_alloc i2c_i801 i2c_core ata_piix libata hw_random ehci_hcd uhci_hcd sr_mod sd_mod mptsas mptscsih mptbase scsi_mod tg3 e1000 CPU: 0 EIP: 0060:[<c046290b>] Not tainted VLI EFLAGS: 00010202 (2.6.11-jh-1) EIP is at _spin_unlock+0x1b/0x30 eax: 00000001 ebx: c0750140 ecx: c0750101 edx: f7e12e08 esi: f70c2400 edi: c0753360 ebp: f7032f10 esp: f7032f10 ds: 007b es: 007b ss: 0068 Process icssvr_daemon (pid: 197135, threadinfo=f7032000 task=f70cd930) Stack: f7032f18 c01cecbb f7032f28 c01ce77e f7e12e08 02668001 f7032f44 c0261dd5 02668001 f7e12e08 c0750140 00000001 f7032f5c f7032f6c c0258708 00000003 f7032f5c 02668001 00000000 00000000 02668001 00000002 00000002 f7032fec Call Trace: [<c010694f>] show_stack+0x7f/0xa0 [<c0106b04>] show_registers+0x164/0x220 [<c0106e94>] die+0xf4/0x1c0 [<c0107015>] do_trap+0xb5/0xc0 [<c01072cc>] do_invalid_op+0xbc/0xd0 [<c01065a3>] error_code+0x2b/0x30 [<c01cecbb>] ipc_unlock+0xb/0x10 [<c01ce77e>] ipc_drop_locks+0x1e/0x40 [<c0261dd5>] ripc_drop_locks+0x45/0x60 [<c0258708>] svr_ripc_drop_locks+0x58/0xb0 [<c020abb3>] icssvr_daemon+0x2f3/0xab0 [<c01023a5>] kernel_thread_helper+0x5/0x10 Code: 1c 0c 49 c0 eb e6 8d 76 00 8d bc 27 00 00 00 00 55 89 c2 89 e5 81 78 04 ad 4e ad de b1 01 75 15 0f b6 02 84 c0 7f 04 86 0a 5d c3 <0f> 0b 70 00 1c 0c 49 c0 eb f2 0f 0b 6f 00 1c 0c 49 c0 eb e1 90 ---------------------------------------------------------------------- >Comment By: Roger Tsang (rogertsang) Date: 2008-05-23 20:21 Message: Logged In: YES user_id=1246761 Originator: NO Latest patch in CVS. ---------------------------------------------------------------------- Comment By: Roger Tsang (rogertsang) Date: 2008-04-23 03:16 Message: Logged In: YES user_id=1246761 Originator: NO Taking a closer look at shm_destroy() and looks like ssi_shm_cleanup() did not handle -ERMID. Kinda late night, so haven't tested either your shmcrash or this patch but it makes sense. File Added: IPC_SHM_DESTROY_FIX.patch ---------------------------------------------------------------------- Comment By: John Hughes (hughesj) Date: 2008-04-22 08:12 Message: Logged In: YES user_id=166336 Originator: YES I've just tested with current CVS (2008-4-22) which, as far as I can tell includes the ipc_shm.c.patch, IPC_SHM_DESTROY_FIX is defined in linux.h and I get the same crash. To duplicate the problem, on a two node system: onnode 1 ./shmcrash 123 100 & sleep 1 onnode 2 ./shmcrash 123 120 Node 2 will crash in ipc_unlock when the shmcrash process on node 1 destroys the shared memory segment. Source for shmcrash program attached. File Added: shmcrash.c ---------------------------------------------------------------------- Comment By: Roger Tsang (rogertsang) Date: 2008-04-19 15:25 Message: Logged In: YES user_id=1246761 Originator: NO Updated patch for sys_shmget() client did not register with server. File Added: ipc_shm.c.patch ---------------------------------------------------------------------- Comment By: Roger Tsang (rogertsang) Date: 2008-04-19 03:13 Message: Logged In: YES user_id=1246761 Originator: NO Race at shm_destroy() going to the wrong svrnode. Try this patch. File Added: ipc_shm.c.patch ---------------------------------------------------------------------- Comment By: John Hughes (hughesj) Date: 2008-04-18 12:27 Message: Logged In: YES user_id=166336 Originator: YES Well, a simple patch seems to be set the "id" to zero in the call to ipc_drop_locks in ripc_drop_locks. That means we continue to call up(&ids->sem) (which we need) but not ipc_unlock (which crashes us immediately). Patch attached, File Added: ripc_drop_locks.patch ---------------------------------------------------------------------- Comment By: John Hughes (hughesj) Date: 2008-04-17 07:18 Message: Logged In: YES user_id=166336 Originator: YES Well,turned out to be pretty easy to duplicate. Wrote a silly test program to create, attach and destroy a segment. Node 1 node 2 shmget smhmat shmget shmat shmctl (IPC_RMID) BUG! ---------------------------------------------------------------------- Comment By: John Hughes (hughesj) Date: 2008-04-17 07:00 Message: Logged In: YES user_id=166336 Originator: YES Frankly I can't see how this is supposed to work. In abort_rmid (cluster/ssi/ipc/ipcshm_svr.c around line 418) we have: nl = NSC_NODELIST_COPY(svp->shm_nodelist); [...] while ((node = NSC_NODELIST_GET_NEXT(&cookie, nl)) != CLUSTERNODE_INVAL) { if (node == this_node) continue; ret = RIPC_DROP_LOCKS(node, &rval, id); } Which will call ipc_drop_locks on the remote node, but who has the lock? And why? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=405834&aid=1944781&group_id=32541 |
From: SourceForge.net <no...@so...> - 2008-06-08 08:51:50
|
Bugs item #1944781, was opened at 2008-04-17 12:23 Message generated for change (Settings changed) made by hughesj You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=405834&aid=1944781&group_id=32541 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: IPC Group: v1.9.3 >Status: Open Resolution: Fixed Priority: 5 Private: No Submitted By: John Hughes (hughesj) Assigned to: Nobody/Anonymous (nobody) Summary: unlocking unlocked lock in ripc_drop_locks Initial Comment: ------------[ cut here ]------------ kernel BUG at include/asm/spinlock.h:112! invalid operand: 0000 [#1] SMP Modules linked in: i915 drm button ac battery parport_pc parport floppy pcspkr snd_intel8x0 snd_ac97_codec snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore snd_page_alloc i2c_i801 i2c_core ata_piix libata hw_random ehci_hcd uhci_hcd sr_mod sd_mod mptsas mptscsih mptbase scsi_mod tg3 e1000 CPU: 0 EIP: 0060:[<c046290b>] Not tainted VLI EFLAGS: 00010202 (2.6.11-jh-1) EIP is at _spin_unlock+0x1b/0x30 eax: 00000001 ebx: c0750140 ecx: c0750101 edx: f7e12e08 esi: f70c2400 edi: c0753360 ebp: f7032f10 esp: f7032f10 ds: 007b es: 007b ss: 0068 Process icssvr_daemon (pid: 197135, threadinfo=f7032000 task=f70cd930) Stack: f7032f18 c01cecbb f7032f28 c01ce77e f7e12e08 02668001 f7032f44 c0261dd5 02668001 f7e12e08 c0750140 00000001 f7032f5c f7032f6c c0258708 00000003 f7032f5c 02668001 00000000 00000000 02668001 00000002 00000002 f7032fec Call Trace: [<c010694f>] show_stack+0x7f/0xa0 [<c0106b04>] show_registers+0x164/0x220 [<c0106e94>] die+0xf4/0x1c0 [<c0107015>] do_trap+0xb5/0xc0 [<c01072cc>] do_invalid_op+0xbc/0xd0 [<c01065a3>] error_code+0x2b/0x30 [<c01cecbb>] ipc_unlock+0xb/0x10 [<c01ce77e>] ipc_drop_locks+0x1e/0x40 [<c0261dd5>] ripc_drop_locks+0x45/0x60 [<c0258708>] svr_ripc_drop_locks+0x58/0xb0 [<c020abb3>] icssvr_daemon+0x2f3/0xab0 [<c01023a5>] kernel_thread_helper+0x5/0x10 Code: 1c 0c 49 c0 eb e6 8d 76 00 8d bc 27 00 00 00 00 55 89 c2 89 e5 81 78 04 ad 4e ad de b1 01 75 15 0f b6 02 84 c0 7f 04 86 0a 5d c3 <0f> 0b 70 00 1c 0c 49 c0 eb f2 0f 0b 6f 00 1c 0c 49 c0 eb e1 90 ---------------------------------------------------------------------- >Comment By: John Hughes (hughesj) Date: 2008-06-08 10:51 Message: Logged In: YES user_id=166336 Originator: YES Using a kernel build from CVS as of 2008-06-07 I see the same problem - crashes in ipc_unlock called from ripc_drop locks. Here's my analysis of the problem: 1. There is a rpc "RIPC_DROP_LOCKS(node, rval, id)" which calls ripc_drop_locks on another node. 2. ripc_drop_locks, which is running in the context of some rpc server on some other node than the process that called RIPC_DROP_LOCKS does: struct shmid_kernel *shp = shm_cli_get(id); if (shp) shp->shm_perm.mode &= ~SHM_LOCK_DEST; ipc_drop_locks(id, (struct kern_ipc_perm *)shp, &shm_ids, 1); (id is obviously nonzero here or shm_cli_get would do nothing interesting) ipc_drop_locks does: if (id > 0) ipc_unlock(perm); if (table) up(&ids->sem); (id is nonzero so we must call ipc_unlock) and ipc_unlock does: spin_unlock(&perm->lock); rcu_read_unlock(); So ipc_unlock expects that the calling process will have the spinlock, but ripc_unlock_locks *CANNOT* have the spinlock - it is operating in the context of the rpc server, and spinlocks can't be held across sleeps, and the rpc server obviously sleeps between rpc's. My hacky fix is to set id to zero in the call to ipc_drop_locks. An equivalent fix would to be to replace the call to ipc_drop_locks by directly calling up(&ids->sem); ---------------------------------------------------------------------- Comment By: Roger Tsang (rogertsang) Date: 2008-05-24 02:21 Message: Logged In: YES user_id=1246761 Originator: NO Latest patch in CVS. ---------------------------------------------------------------------- Comment By: Roger Tsang (rogertsang) Date: 2008-04-23 09:16 Message: Logged In: YES user_id=1246761 Originator: NO Taking a closer look at shm_destroy() and looks like ssi_shm_cleanup() did not handle -ERMID. Kinda late night, so haven't tested either your shmcrash or this patch but it makes sense. File Added: IPC_SHM_DESTROY_FIX.patch ---------------------------------------------------------------------- Comment By: John Hughes (hughesj) Date: 2008-04-22 14:12 Message: Logged In: YES user_id=166336 Originator: YES I've just tested with current CVS (2008-4-22) which, as far as I can tell includes the ipc_shm.c.patch, IPC_SHM_DESTROY_FIX is defined in linux.h and I get the same crash. To duplicate the problem, on a two node system: onnode 1 ./shmcrash 123 100 & sleep 1 onnode 2 ./shmcrash 123 120 Node 2 will crash in ipc_unlock when the shmcrash process on node 1 destroys the shared memory segment. Source for shmcrash program attached. File Added: shmcrash.c ---------------------------------------------------------------------- Comment By: Roger Tsang (rogertsang) Date: 2008-04-19 21:25 Message: Logged In: YES user_id=1246761 Originator: NO Updated patch for sys_shmget() client did not register with server. File Added: ipc_shm.c.patch ---------------------------------------------------------------------- Comment By: Roger Tsang (rogertsang) Date: 2008-04-19 09:13 Message: Logged In: YES user_id=1246761 Originator: NO Race at shm_destroy() going to the wrong svrnode. Try this patch. File Added: ipc_shm.c.patch ---------------------------------------------------------------------- Comment By: John Hughes (hughesj) Date: 2008-04-18 18:27 Message: Logged In: YES user_id=166336 Originator: YES Well, a simple patch seems to be set the "id" to zero in the call to ipc_drop_locks in ripc_drop_locks. That means we continue to call up(&ids->sem) (which we need) but not ipc_unlock (which crashes us immediately). Patch attached, File Added: ripc_drop_locks.patch ---------------------------------------------------------------------- Comment By: John Hughes (hughesj) Date: 2008-04-17 13:18 Message: Logged In: YES user_id=166336 Originator: YES Well,turned out to be pretty easy to duplicate. Wrote a silly test program to create, attach and destroy a segment. Node 1 node 2 shmget smhmat shmget shmat shmctl (IPC_RMID) BUG! ---------------------------------------------------------------------- Comment By: John Hughes (hughesj) Date: 2008-04-17 13:00 Message: Logged In: YES user_id=166336 Originator: YES Frankly I can't see how this is supposed to work. In abort_rmid (cluster/ssi/ipc/ipcshm_svr.c around line 418) we have: nl = NSC_NODELIST_COPY(svp->shm_nodelist); [...] while ((node = NSC_NODELIST_GET_NEXT(&cookie, nl)) != CLUSTERNODE_INVAL) { if (node == this_node) continue; ret = RIPC_DROP_LOCKS(node, &rval, id); } Which will call ipc_drop_locks on the remote node, but who has the lock? And why? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=405834&aid=1944781&group_id=32541 |
From: SourceForge.net <no...@so...> - 2008-10-20 12:33:41
|
Bugs item #1944781, was opened at 2008-04-17 12:23 Message generated for change (Comment added) made by hughesj You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=405834&aid=1944781&group_id=32541 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: IPC Group: v1.9.3 Status: Open Resolution: Fixed Priority: 5 Private: No Submitted By: John Hughes (hughesj) Assigned to: Nobody/Anonymous (nobody) Summary: unlocking unlocked lock in ripc_drop_locks Initial Comment: ------------[ cut here ]------------ kernel BUG at include/asm/spinlock.h:112! invalid operand: 0000 [#1] SMP Modules linked in: i915 drm button ac battery parport_pc parport floppy pcspkr snd_intel8x0 snd_ac97_codec snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore snd_page_alloc i2c_i801 i2c_core ata_piix libata hw_random ehci_hcd uhci_hcd sr_mod sd_mod mptsas mptscsih mptbase scsi_mod tg3 e1000 CPU: 0 EIP: 0060:[<c046290b>] Not tainted VLI EFLAGS: 00010202 (2.6.11-jh-1) EIP is at _spin_unlock+0x1b/0x30 eax: 00000001 ebx: c0750140 ecx: c0750101 edx: f7e12e08 esi: f70c2400 edi: c0753360 ebp: f7032f10 esp: f7032f10 ds: 007b es: 007b ss: 0068 Process icssvr_daemon (pid: 197135, threadinfo=f7032000 task=f70cd930) Stack: f7032f18 c01cecbb f7032f28 c01ce77e f7e12e08 02668001 f7032f44 c0261dd5 02668001 f7e12e08 c0750140 00000001 f7032f5c f7032f6c c0258708 00000003 f7032f5c 02668001 00000000 00000000 02668001 00000002 00000002 f7032fec Call Trace: [<c010694f>] show_stack+0x7f/0xa0 [<c0106b04>] show_registers+0x164/0x220 [<c0106e94>] die+0xf4/0x1c0 [<c0107015>] do_trap+0xb5/0xc0 [<c01072cc>] do_invalid_op+0xbc/0xd0 [<c01065a3>] error_code+0x2b/0x30 [<c01cecbb>] ipc_unlock+0xb/0x10 [<c01ce77e>] ipc_drop_locks+0x1e/0x40 [<c0261dd5>] ripc_drop_locks+0x45/0x60 [<c0258708>] svr_ripc_drop_locks+0x58/0xb0 [<c020abb3>] icssvr_daemon+0x2f3/0xab0 [<c01023a5>] kernel_thread_helper+0x5/0x10 Code: 1c 0c 49 c0 eb e6 8d 76 00 8d bc 27 00 00 00 00 55 89 c2 89 e5 81 78 04 ad 4e ad de b1 01 75 15 0f b6 02 84 c0 7f 04 86 0a 5d c3 <0f> 0b 70 00 1c 0c 49 c0 eb f2 0f 0b 6f 00 1c 0c 49 c0 eb e1 90 ---------------------------------------------------------------------- Comment By: John Hughes (hughesj) Date: 2008-10-20 14:22 Message: Fixed in latest CVS ---------------------------------------------------------------------- Comment By: John Hughes (hughesj) Date: 2008-06-08 10:51 Message: Logged In: YES user_id=166336 Originator: YES Using a kernel build from CVS as of 2008-06-07 I see the same problem - crashes in ipc_unlock called from ripc_drop locks. Here's my analysis of the problem: 1. There is a rpc "RIPC_DROP_LOCKS(node, rval, id)" which calls ripc_drop_locks on another node. 2. ripc_drop_locks, which is running in the context of some rpc server on some other node than the process that called RIPC_DROP_LOCKS does: struct shmid_kernel *shp = shm_cli_get(id); if (shp) shp->shm_perm.mode &= ~SHM_LOCK_DEST; ipc_drop_locks(id, (struct kern_ipc_perm *)shp, &shm_ids, 1); (id is obviously nonzero here or shm_cli_get would do nothing interesting) ipc_drop_locks does: if (id > 0) ipc_unlock(perm); if (table) up(&ids->sem); (id is nonzero so we must call ipc_unlock) and ipc_unlock does: spin_unlock(&perm->lock); rcu_read_unlock(); So ipc_unlock expects that the calling process will have the spinlock, but ripc_unlock_locks *CANNOT* have the spinlock - it is operating in the context of the rpc server, and spinlocks can't be held across sleeps, and the rpc server obviously sleeps between rpc's. My hacky fix is to set id to zero in the call to ipc_drop_locks. An equivalent fix would to be to replace the call to ipc_drop_locks by directly calling up(&ids->sem); ---------------------------------------------------------------------- Comment By: Roger Tsang (rogertsang) Date: 2008-05-24 02:21 Message: Logged In: YES user_id=1246761 Originator: NO Latest patch in CVS. ---------------------------------------------------------------------- Comment By: Roger Tsang (rogertsang) Date: 2008-04-23 09:16 Message: Logged In: YES user_id=1246761 Originator: NO Taking a closer look at shm_destroy() and looks like ssi_shm_cleanup() did not handle -ERMID. Kinda late night, so haven't tested either your shmcrash or this patch but it makes sense. File Added: IPC_SHM_DESTROY_FIX.patch ---------------------------------------------------------------------- Comment By: John Hughes (hughesj) Date: 2008-04-22 14:12 Message: Logged In: YES user_id=166336 Originator: YES I've just tested with current CVS (2008-4-22) which, as far as I can tell includes the ipc_shm.c.patch, IPC_SHM_DESTROY_FIX is defined in linux.h and I get the same crash. To duplicate the problem, on a two node system: onnode 1 ./shmcrash 123 100 & sleep 1 onnode 2 ./shmcrash 123 120 Node 2 will crash in ipc_unlock when the shmcrash process on node 1 destroys the shared memory segment. Source for shmcrash program attached. File Added: shmcrash.c ---------------------------------------------------------------------- Comment By: Roger Tsang (rogertsang) Date: 2008-04-19 21:25 Message: Logged In: YES user_id=1246761 Originator: NO Updated patch for sys_shmget() client did not register with server. File Added: ipc_shm.c.patch ---------------------------------------------------------------------- Comment By: Roger Tsang (rogertsang) Date: 2008-04-19 09:13 Message: Logged In: YES user_id=1246761 Originator: NO Race at shm_destroy() going to the wrong svrnode. Try this patch. File Added: ipc_shm.c.patch ---------------------------------------------------------------------- Comment By: John Hughes (hughesj) Date: 2008-04-18 18:27 Message: Logged In: YES user_id=166336 Originator: YES Well, a simple patch seems to be set the "id" to zero in the call to ipc_drop_locks in ripc_drop_locks. That means we continue to call up(&ids->sem) (which we need) but not ipc_unlock (which crashes us immediately). Patch attached, File Added: ripc_drop_locks.patch ---------------------------------------------------------------------- Comment By: John Hughes (hughesj) Date: 2008-04-17 13:18 Message: Logged In: YES user_id=166336 Originator: YES Well,turned out to be pretty easy to duplicate. Wrote a silly test program to create, attach and destroy a segment. Node 1 node 2 shmget smhmat shmget shmat shmctl (IPC_RMID) BUG! ---------------------------------------------------------------------- Comment By: John Hughes (hughesj) Date: 2008-04-17 13:00 Message: Logged In: YES user_id=166336 Originator: YES Frankly I can't see how this is supposed to work. In abort_rmid (cluster/ssi/ipc/ipcshm_svr.c around line 418) we have: nl = NSC_NODELIST_COPY(svp->shm_nodelist); [...] while ((node = NSC_NODELIST_GET_NEXT(&cookie, nl)) != CLUSTERNODE_INVAL) { if (node == this_node) continue; ret = RIPC_DROP_LOCKS(node, &rval, id); } Which will call ipc_drop_locks on the remote node, but who has the lock? And why? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=405834&aid=1944781&group_id=32541 |
From: SourceForge.net <no...@so...> - 2010-03-13 20:00:16
|
Bugs item #1944781, was opened at 2008-04-17 06:23 Message generated for change (Settings changed) made by rogertsang You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=405834&aid=1944781&group_id=32541 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: IPC Group: v1.9.3 >Status: Closed Resolution: Fixed Priority: 5 Private: No Submitted By: John Hughes (hughesj) Assigned to: Nobody/Anonymous (nobody) Summary: unlocking unlocked lock in ripc_drop_locks Initial Comment: ------------[ cut here ]------------ kernel BUG at include/asm/spinlock.h:112! invalid operand: 0000 [#1] SMP Modules linked in: i915 drm button ac battery parport_pc parport floppy pcspkr snd_intel8x0 snd_ac97_codec snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore snd_page_alloc i2c_i801 i2c_core ata_piix libata hw_random ehci_hcd uhci_hcd sr_mod sd_mod mptsas mptscsih mptbase scsi_mod tg3 e1000 CPU: 0 EIP: 0060:[<c046290b>] Not tainted VLI EFLAGS: 00010202 (2.6.11-jh-1) EIP is at _spin_unlock+0x1b/0x30 eax: 00000001 ebx: c0750140 ecx: c0750101 edx: f7e12e08 esi: f70c2400 edi: c0753360 ebp: f7032f10 esp: f7032f10 ds: 007b es: 007b ss: 0068 Process icssvr_daemon (pid: 197135, threadinfo=f7032000 task=f70cd930) Stack: f7032f18 c01cecbb f7032f28 c01ce77e f7e12e08 02668001 f7032f44 c0261dd5 02668001 f7e12e08 c0750140 00000001 f7032f5c f7032f6c c0258708 00000003 f7032f5c 02668001 00000000 00000000 02668001 00000002 00000002 f7032fec Call Trace: [<c010694f>] show_stack+0x7f/0xa0 [<c0106b04>] show_registers+0x164/0x220 [<c0106e94>] die+0xf4/0x1c0 [<c0107015>] do_trap+0xb5/0xc0 [<c01072cc>] do_invalid_op+0xbc/0xd0 [<c01065a3>] error_code+0x2b/0x30 [<c01cecbb>] ipc_unlock+0xb/0x10 [<c01ce77e>] ipc_drop_locks+0x1e/0x40 [<c0261dd5>] ripc_drop_locks+0x45/0x60 [<c0258708>] svr_ripc_drop_locks+0x58/0xb0 [<c020abb3>] icssvr_daemon+0x2f3/0xab0 [<c01023a5>] kernel_thread_helper+0x5/0x10 Code: 1c 0c 49 c0 eb e6 8d 76 00 8d bc 27 00 00 00 00 55 89 c2 89 e5 81 78 04 ad 4e ad de b1 01 75 15 0f b6 02 84 c0 7f 04 86 0a 5d c3 <0f> 0b 70 00 1c 0c 49 c0 eb f2 0f 0b 6f 00 1c 0c 49 c0 eb e1 90 ---------------------------------------------------------------------- Comment By: John Hughes (hughesj) Date: 2008-10-20 08:22 Message: Fixed in latest CVS ---------------------------------------------------------------------- Comment By: John Hughes (hughesj) Date: 2008-06-08 04:51 Message: Logged In: YES user_id=166336 Originator: YES Using a kernel build from CVS as of 2008-06-07 I see the same problem - crashes in ipc_unlock called from ripc_drop locks. Here's my analysis of the problem: 1. There is a rpc "RIPC_DROP_LOCKS(node, rval, id)" which calls ripc_drop_locks on another node. 2. ripc_drop_locks, which is running in the context of some rpc server on some other node than the process that called RIPC_DROP_LOCKS does: struct shmid_kernel *shp = shm_cli_get(id); if (shp) shp->shm_perm.mode &= ~SHM_LOCK_DEST; ipc_drop_locks(id, (struct kern_ipc_perm *)shp, &shm_ids, 1); (id is obviously nonzero here or shm_cli_get would do nothing interesting) ipc_drop_locks does: if (id > 0) ipc_unlock(perm); if (table) up(&ids->sem); (id is nonzero so we must call ipc_unlock) and ipc_unlock does: spin_unlock(&perm->lock); rcu_read_unlock(); So ipc_unlock expects that the calling process will have the spinlock, but ripc_unlock_locks *CANNOT* have the spinlock - it is operating in the context of the rpc server, and spinlocks can't be held across sleeps, and the rpc server obviously sleeps between rpc's. My hacky fix is to set id to zero in the call to ipc_drop_locks. An equivalent fix would to be to replace the call to ipc_drop_locks by directly calling up(&ids->sem); ---------------------------------------------------------------------- Comment By: Roger Tsang (rogertsang) Date: 2008-05-23 20:21 Message: Logged In: YES user_id=1246761 Originator: NO Latest patch in CVS. ---------------------------------------------------------------------- Comment By: Roger Tsang (rogertsang) Date: 2008-04-23 03:16 Message: Logged In: YES user_id=1246761 Originator: NO Taking a closer look at shm_destroy() and looks like ssi_shm_cleanup() did not handle -ERMID. Kinda late night, so haven't tested either your shmcrash or this patch but it makes sense. File Added: IPC_SHM_DESTROY_FIX.patch ---------------------------------------------------------------------- Comment By: John Hughes (hughesj) Date: 2008-04-22 08:12 Message: Logged In: YES user_id=166336 Originator: YES I've just tested with current CVS (2008-4-22) which, as far as I can tell includes the ipc_shm.c.patch, IPC_SHM_DESTROY_FIX is defined in linux.h and I get the same crash. To duplicate the problem, on a two node system: onnode 1 ./shmcrash 123 100 & sleep 1 onnode 2 ./shmcrash 123 120 Node 2 will crash in ipc_unlock when the shmcrash process on node 1 destroys the shared memory segment. Source for shmcrash program attached. File Added: shmcrash.c ---------------------------------------------------------------------- Comment By: Roger Tsang (rogertsang) Date: 2008-04-19 15:25 Message: Logged In: YES user_id=1246761 Originator: NO Updated patch for sys_shmget() client did not register with server. File Added: ipc_shm.c.patch ---------------------------------------------------------------------- Comment By: Roger Tsang (rogertsang) Date: 2008-04-19 03:13 Message: Logged In: YES user_id=1246761 Originator: NO Race at shm_destroy() going to the wrong svrnode. Try this patch. File Added: ipc_shm.c.patch ---------------------------------------------------------------------- Comment By: John Hughes (hughesj) Date: 2008-04-18 12:27 Message: Logged In: YES user_id=166336 Originator: YES Well, a simple patch seems to be set the "id" to zero in the call to ipc_drop_locks in ripc_drop_locks. That means we continue to call up(&ids->sem) (which we need) but not ipc_unlock (which crashes us immediately). Patch attached, File Added: ripc_drop_locks.patch ---------------------------------------------------------------------- Comment By: John Hughes (hughesj) Date: 2008-04-17 07:18 Message: Logged In: YES user_id=166336 Originator: YES Well,turned out to be pretty easy to duplicate. Wrote a silly test program to create, attach and destroy a segment. Node 1 node 2 shmget smhmat shmget shmat shmctl (IPC_RMID) BUG! ---------------------------------------------------------------------- Comment By: John Hughes (hughesj) Date: 2008-04-17 07:00 Message: Logged In: YES user_id=166336 Originator: YES Frankly I can't see how this is supposed to work. In abort_rmid (cluster/ssi/ipc/ipcshm_svr.c around line 418) we have: nl = NSC_NODELIST_COPY(svp->shm_nodelist); [...] while ((node = NSC_NODELIST_GET_NEXT(&cookie, nl)) != CLUSTERNODE_INVAL) { if (node == this_node) continue; ret = RIPC_DROP_LOCKS(node, &rval, id); } Which will call ipc_drop_locks on the remote node, but who has the lock? And why? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=405834&aid=1944781&group_id=32541 |