Thread: [SSI-devel] [ ssic-linux-Bugs-1944781 ] unlocking unlocked lock in ripc_drop_locks

Brought to you by: brucewalker, rogertsang

ssic-linux-devel

[SSI-devel] [ ssic-linux-Bugs-1944781 ] unlocking unlocked lock in ripc_drop_locks

From: SourceForge.net <no...@so...> - 2008-04-17 10:23:30

Bugs item #1944781, was opened at 2008-04-17 12:23
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=405834&aid=1944781&group_id=32541

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: IPC
Group: v1.9.3
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: John Hughes (hughesj)
Assigned to: Nobody/Anonymous (nobody)
Summary: unlocking unlocked lock in ripc_drop_locks

Initial Comment:
------------[ cut here ]------------
kernel BUG at include/asm/spinlock.h:112!
invalid operand: 0000 [#1]
SMP
Modules linked in: i915 drm button ac battery parport_pc parport floppy pcspkr snd_intel8x0 snd_ac97_codec snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore snd_page_alloc i2c_i801 i2c_core ata_piix libata hw_random ehci_hcd uhci_hcd sr_mod sd_mod mptsas mptscsih mptbase scsi_mod tg3 e1000
CPU:    0
EIP:    0060:[<c046290b>]    Not tainted VLI
EFLAGS: 00010202   (2.6.11-jh-1)
EIP is at _spin_unlock+0x1b/0x30
eax: 00000001   ebx: c0750140   ecx: c0750101   edx: f7e12e08
esi: f70c2400   edi: c0753360   ebp: f7032f10   esp: f7032f10
ds: 007b   es: 007b   ss: 0068
Process icssvr_daemon (pid: 197135, threadinfo=f7032000 task=f70cd930)
Stack: f7032f18 c01cecbb f7032f28 c01ce77e f7e12e08 02668001 f7032f44 c0261dd5
       02668001 f7e12e08 c0750140 00000001 f7032f5c f7032f6c c0258708 00000003
       f7032f5c 02668001 00000000 00000000 02668001 00000002 00000002 f7032fec
Call Trace:
 [<c010694f>] show_stack+0x7f/0xa0
 [<c0106b04>] show_registers+0x164/0x220
 [<c0106e94>] die+0xf4/0x1c0
 [<c0107015>] do_trap+0xb5/0xc0
 [<c01072cc>] do_invalid_op+0xbc/0xd0
 [<c01065a3>] error_code+0x2b/0x30
 [<c01cecbb>] ipc_unlock+0xb/0x10
 [<c01ce77e>] ipc_drop_locks+0x1e/0x40
 [<c0261dd5>] ripc_drop_locks+0x45/0x60
 [<c0258708>] svr_ripc_drop_locks+0x58/0xb0
 [<c020abb3>] icssvr_daemon+0x2f3/0xab0
 [<c01023a5>] kernel_thread_helper+0x5/0x10
Code: 1c 0c 49 c0 eb e6 8d 76 00 8d bc 27 00 00 00 00 55 89 c2 89 e5 81 78 04 ad 4e ad de b1 01 75 15 0f b6 02 84 c0 7f 04 86 0a 5d c3 <0f> 0b 70 00 1c 0c 49 c0 eb f2 0f 0b 6f 00 1c 0c 49 c0 eb e1 90


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=405834&aid=1944781&group_id=32541

[SSI-devel] [ ssic-linux-Bugs-1944781 ] unlocking unlocked lock in ripc_drop_locks

From: SourceForge.net <no...@so...> - 2008-04-17 11:00:55

Bugs item #1944781, was opened at 2008-04-17 12:23
Message generated for change (Comment added) made by hughesj
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=405834&aid=1944781&group_id=32541

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: IPC
Group: v1.9.3
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: John Hughes (hughesj)
Assigned to: Nobody/Anonymous (nobody)
Summary: unlocking unlocked lock in ripc_drop_locks

Initial Comment:
------------[ cut here ]------------
kernel BUG at include/asm/spinlock.h:112!
invalid operand: 0000 [#1]
SMP
Modules linked in: i915 drm button ac battery parport_pc parport floppy pcspkr snd_intel8x0 snd_ac97_codec snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore snd_page_alloc i2c_i801 i2c_core ata_piix libata hw_random ehci_hcd uhci_hcd sr_mod sd_mod mptsas mptscsih mptbase scsi_mod tg3 e1000
CPU:    0
EIP:    0060:[<c046290b>]    Not tainted VLI
EFLAGS: 00010202   (2.6.11-jh-1)
EIP is at _spin_unlock+0x1b/0x30
eax: 00000001   ebx: c0750140   ecx: c0750101   edx: f7e12e08
esi: f70c2400   edi: c0753360   ebp: f7032f10   esp: f7032f10
ds: 007b   es: 007b   ss: 0068
Process icssvr_daemon (pid: 197135, threadinfo=f7032000 task=f70cd930)
Stack: f7032f18 c01cecbb f7032f28 c01ce77e f7e12e08 02668001 f7032f44 c0261dd5
       02668001 f7e12e08 c0750140 00000001 f7032f5c f7032f6c c0258708 00000003
       f7032f5c 02668001 00000000 00000000 02668001 00000002 00000002 f7032fec
Call Trace:
 [<c010694f>] show_stack+0x7f/0xa0
 [<c0106b04>] show_registers+0x164/0x220
 [<c0106e94>] die+0xf4/0x1c0
 [<c0107015>] do_trap+0xb5/0xc0
 [<c01072cc>] do_invalid_op+0xbc/0xd0
 [<c01065a3>] error_code+0x2b/0x30
 [<c01cecbb>] ipc_unlock+0xb/0x10
 [<c01ce77e>] ipc_drop_locks+0x1e/0x40
 [<c0261dd5>] ripc_drop_locks+0x45/0x60
 [<c0258708>] svr_ripc_drop_locks+0x58/0xb0
 [<c020abb3>] icssvr_daemon+0x2f3/0xab0
 [<c01023a5>] kernel_thread_helper+0x5/0x10
Code: 1c 0c 49 c0 eb e6 8d 76 00 8d bc 27 00 00 00 00 55 89 c2 89 e5 81 78 04 ad 4e ad de b1 01 75 15 0f b6 02 84 c0 7f 04 86 0a 5d c3 <0f> 0b 70 00 1c 0c 49 c0 eb f2 0f 0b 6f 00 1c 0c 49 c0 eb e1 90


----------------------------------------------------------------------

>Comment By: John Hughes (hughesj)
Date: 2008-04-17 13:00

Message:
Logged In: YES 
user_id=166336
Originator: YES

Frankly I can't see how this is supposed to work.  In abort_rmid
(cluster/ssi/ipc/ipcshm_svr.c around line 418) we have:

        nl = NSC_NODELIST_COPY(svp->shm_nodelist);
        [...]
        while ((node = NSC_NODELIST_GET_NEXT(&cookie, nl))                
                                      
                                                != CLUSTERNODE_INVAL) {
               if (node == this_node)
                        continue;
                ret = RIPC_DROP_LOCKS(node, &rval, id);
        }

Which will call ipc_drop_locks on the remote node, but who has the lock? 
And why?

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=405834&aid=1944781&group_id=32541

[SSI-devel] [ ssic-linux-Bugs-1944781 ] unlocking unlocked lock in ripc_drop_locks

From: SourceForge.net <no...@so...> - 2008-04-17 11:19:22

Bugs item #1944781, was opened at 2008-04-17 12:23
Message generated for change (Comment added) made by hughesj
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=405834&aid=1944781&group_id=32541

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: IPC
Group: v1.9.3
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: John Hughes (hughesj)
Assigned to: Nobody/Anonymous (nobody)
Summary: unlocking unlocked lock in ripc_drop_locks

Initial Comment:
------------[ cut here ]------------
kernel BUG at include/asm/spinlock.h:112!
invalid operand: 0000 [#1]
SMP
Modules linked in: i915 drm button ac battery parport_pc parport floppy pcspkr snd_intel8x0 snd_ac97_codec snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore snd_page_alloc i2c_i801 i2c_core ata_piix libata hw_random ehci_hcd uhci_hcd sr_mod sd_mod mptsas mptscsih mptbase scsi_mod tg3 e1000
CPU:    0
EIP:    0060:[<c046290b>]    Not tainted VLI
EFLAGS: 00010202   (2.6.11-jh-1)
EIP is at _spin_unlock+0x1b/0x30
eax: 00000001   ebx: c0750140   ecx: c0750101   edx: f7e12e08
esi: f70c2400   edi: c0753360   ebp: f7032f10   esp: f7032f10
ds: 007b   es: 007b   ss: 0068
Process icssvr_daemon (pid: 197135, threadinfo=f7032000 task=f70cd930)
Stack: f7032f18 c01cecbb f7032f28 c01ce77e f7e12e08 02668001 f7032f44 c0261dd5
       02668001 f7e12e08 c0750140 00000001 f7032f5c f7032f6c c0258708 00000003
       f7032f5c 02668001 00000000 00000000 02668001 00000002 00000002 f7032fec
Call Trace:
 [<c010694f>] show_stack+0x7f/0xa0
 [<c0106b04>] show_registers+0x164/0x220
 [<c0106e94>] die+0xf4/0x1c0
 [<c0107015>] do_trap+0xb5/0xc0
 [<c01072cc>] do_invalid_op+0xbc/0xd0
 [<c01065a3>] error_code+0x2b/0x30
 [<c01cecbb>] ipc_unlock+0xb/0x10
 [<c01ce77e>] ipc_drop_locks+0x1e/0x40
 [<c0261dd5>] ripc_drop_locks+0x45/0x60
 [<c0258708>] svr_ripc_drop_locks+0x58/0xb0
 [<c020abb3>] icssvr_daemon+0x2f3/0xab0
 [<c01023a5>] kernel_thread_helper+0x5/0x10
Code: 1c 0c 49 c0 eb e6 8d 76 00 8d bc 27 00 00 00 00 55 89 c2 89 e5 81 78 04 ad 4e ad de b1 01 75 15 0f b6 02 84 c0 7f 04 86 0a 5d c3 <0f> 0b 70 00 1c 0c 49 c0 eb f2 0f 0b 6f 00 1c 0c 49 c0 eb e1 90


----------------------------------------------------------------------

>Comment By: John Hughes (hughesj)
Date: 2008-04-17 13:18

Message:
Logged In: YES 
user_id=166336
Originator: YES

Well,turned out to be pretty easy to duplicate.  Wrote a silly test
program to create, attach and destroy a segment.

Node 1              node 2

shmget
smhmat
                    shmget
                    shmat
shmctl (IPC_RMID)

                    BUG!



----------------------------------------------------------------------

Comment By: John Hughes (hughesj)
Date: 2008-04-17 13:00

Message:
Logged In: YES 
user_id=166336
Originator: YES

Frankly I can't see how this is supposed to work.  In abort_rmid
(cluster/ssi/ipc/ipcshm_svr.c around line 418) we have:

        nl = NSC_NODELIST_COPY(svp->shm_nodelist);
        [...]
        while ((node = NSC_NODELIST_GET_NEXT(&cookie, nl))                
                                      
                                                != CLUSTERNODE_INVAL) {
               if (node == this_node)
                        continue;
                ret = RIPC_DROP_LOCKS(node, &rval, id);
        }

Which will call ipc_drop_locks on the remote node, but who has the lock? 
And why?

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=405834&aid=1944781&group_id=32541

[SSI-devel] [ ssic-linux-Bugs-1944781 ] unlocking unlocked lock in ripc_drop_locks

From: SourceForge.net <no...@so...> - 2008-04-18 16:27:53

Bugs item #1944781, was opened at 2008-04-17 12:23
Message generated for change (Comment added) made by hughesj
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=405834&aid=1944781&group_id=32541

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: IPC
Group: v1.9.3
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: John Hughes (hughesj)
Assigned to: Nobody/Anonymous (nobody)
Summary: unlocking unlocked lock in ripc_drop_locks

Initial Comment:
------------[ cut here ]------------
kernel BUG at include/asm/spinlock.h:112!
invalid operand: 0000 [#1]
SMP
Modules linked in: i915 drm button ac battery parport_pc parport floppy pcspkr snd_intel8x0 snd_ac97_codec snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore snd_page_alloc i2c_i801 i2c_core ata_piix libata hw_random ehci_hcd uhci_hcd sr_mod sd_mod mptsas mptscsih mptbase scsi_mod tg3 e1000
CPU:    0
EIP:    0060:[<c046290b>]    Not tainted VLI
EFLAGS: 00010202   (2.6.11-jh-1)
EIP is at _spin_unlock+0x1b/0x30
eax: 00000001   ebx: c0750140   ecx: c0750101   edx: f7e12e08
esi: f70c2400   edi: c0753360   ebp: f7032f10   esp: f7032f10
ds: 007b   es: 007b   ss: 0068
Process icssvr_daemon (pid: 197135, threadinfo=f7032000 task=f70cd930)
Stack: f7032f18 c01cecbb f7032f28 c01ce77e f7e12e08 02668001 f7032f44 c0261dd5
       02668001 f7e12e08 c0750140 00000001 f7032f5c f7032f6c c0258708 00000003
       f7032f5c 02668001 00000000 00000000 02668001 00000002 00000002 f7032fec
Call Trace:
 [<c010694f>] show_stack+0x7f/0xa0
 [<c0106b04>] show_registers+0x164/0x220
 [<c0106e94>] die+0xf4/0x1c0
 [<c0107015>] do_trap+0xb5/0xc0
 [<c01072cc>] do_invalid_op+0xbc/0xd0
 [<c01065a3>] error_code+0x2b/0x30
 [<c01cecbb>] ipc_unlock+0xb/0x10
 [<c01ce77e>] ipc_drop_locks+0x1e/0x40
 [<c0261dd5>] ripc_drop_locks+0x45/0x60
 [<c0258708>] svr_ripc_drop_locks+0x58/0xb0
 [<c020abb3>] icssvr_daemon+0x2f3/0xab0
 [<c01023a5>] kernel_thread_helper+0x5/0x10
Code: 1c 0c 49 c0 eb e6 8d 76 00 8d bc 27 00 00 00 00 55 89 c2 89 e5 81 78 04 ad 4e ad de b1 01 75 15 0f b6 02 84 c0 7f 04 86 0a 5d c3 <0f> 0b 70 00 1c 0c 49 c0 eb f2 0f 0b 6f 00 1c 0c 49 c0 eb e1 90


----------------------------------------------------------------------

>Comment By: John Hughes (hughesj)
Date: 2008-04-18 18:27

Message:
Logged In: YES 
user_id=166336
Originator: YES

Well, a simple patch seems to be set the "id" to zero in the call to
ipc_drop_locks in ripc_drop_locks.  That means we continue to call
up(&ids->sem) (which we need) but not ipc_unlock (which crashes us
immediately).
Patch attached,
File Added: ripc_drop_locks.patch

----------------------------------------------------------------------

Comment By: John Hughes (hughesj)
Date: 2008-04-17 13:18

Message:
Logged In: YES 
user_id=166336
Originator: YES

Well,turned out to be pretty easy to duplicate.  Wrote a silly test
program to create, attach and destroy a segment.

Node 1              node 2

shmget
smhmat
                    shmget
                    shmat
shmctl (IPC_RMID)

                    BUG!



----------------------------------------------------------------------

Comment By: John Hughes (hughesj)
Date: 2008-04-17 13:00

Message:
Logged In: YES 
user_id=166336
Originator: YES

Frankly I can't see how this is supposed to work.  In abort_rmid
(cluster/ssi/ipc/ipcshm_svr.c around line 418) we have:

        nl = NSC_NODELIST_COPY(svp->shm_nodelist);
        [...]
        while ((node = NSC_NODELIST_GET_NEXT(&cookie, nl))                
                                      
                                                != CLUSTERNODE_INVAL) {
               if (node == this_node)
                        continue;
                ret = RIPC_DROP_LOCKS(node, &rval, id);
        }

Which will call ipc_drop_locks on the remote node, but who has the lock? 
And why?

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=405834&aid=1944781&group_id=32541

[SSI-devel] [ ssic-linux-Bugs-1944781 ] unlocking unlocked lock in ripc_drop_locks

From: SourceForge.net <no...@so...> - 2008-04-19 07:13:52

Bugs item #1944781, was opened at 2008-04-17 06:23
Message generated for change (Comment added) made by rogertsang
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=405834&aid=1944781&group_id=32541

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: IPC
Group: v1.9.3
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: John Hughes (hughesj)
Assigned to: Nobody/Anonymous (nobody)
Summary: unlocking unlocked lock in ripc_drop_locks

Initial Comment:
------------[ cut here ]------------
kernel BUG at include/asm/spinlock.h:112!
invalid operand: 0000 [#1]
SMP
Modules linked in: i915 drm button ac battery parport_pc parport floppy pcspkr snd_intel8x0 snd_ac97_codec snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore snd_page_alloc i2c_i801 i2c_core ata_piix libata hw_random ehci_hcd uhci_hcd sr_mod sd_mod mptsas mptscsih mptbase scsi_mod tg3 e1000
CPU:    0
EIP:    0060:[<c046290b>]    Not tainted VLI
EFLAGS: 00010202   (2.6.11-jh-1)
EIP is at _spin_unlock+0x1b/0x30
eax: 00000001   ebx: c0750140   ecx: c0750101   edx: f7e12e08
esi: f70c2400   edi: c0753360   ebp: f7032f10   esp: f7032f10
ds: 007b   es: 007b   ss: 0068
Process icssvr_daemon (pid: 197135, threadinfo=f7032000 task=f70cd930)
Stack: f7032f18 c01cecbb f7032f28 c01ce77e f7e12e08 02668001 f7032f44 c0261dd5
       02668001 f7e12e08 c0750140 00000001 f7032f5c f7032f6c c0258708 00000003
       f7032f5c 02668001 00000000 00000000 02668001 00000002 00000002 f7032fec
Call Trace:
 [<c010694f>] show_stack+0x7f/0xa0
 [<c0106b04>] show_registers+0x164/0x220
 [<c0106e94>] die+0xf4/0x1c0
 [<c0107015>] do_trap+0xb5/0xc0
 [<c01072cc>] do_invalid_op+0xbc/0xd0
 [<c01065a3>] error_code+0x2b/0x30
 [<c01cecbb>] ipc_unlock+0xb/0x10
 [<c01ce77e>] ipc_drop_locks+0x1e/0x40
 [<c0261dd5>] ripc_drop_locks+0x45/0x60
 [<c0258708>] svr_ripc_drop_locks+0x58/0xb0
 [<c020abb3>] icssvr_daemon+0x2f3/0xab0
 [<c01023a5>] kernel_thread_helper+0x5/0x10
Code: 1c 0c 49 c0 eb e6 8d 76 00 8d bc 27 00 00 00 00 55 89 c2 89 e5 81 78 04 ad 4e ad de b1 01 75 15 0f b6 02 84 c0 7f 04 86 0a 5d c3 <0f> 0b 70 00 1c 0c 49 c0 eb f2 0f 0b 6f 00 1c 0c 49 c0 eb e1 90


----------------------------------------------------------------------

>Comment By: Roger Tsang (rogertsang)
Date: 2008-04-19 03:13

Message:
Logged In: YES 
user_id=1246761
Originator: NO

Race at shm_destroy() going to the wrong svrnode. Try this patch.
File Added: ipc_shm.c.patch

----------------------------------------------------------------------

Comment By: John Hughes (hughesj)
Date: 2008-04-18 12:27

Message:
Logged In: YES 
user_id=166336
Originator: YES

Well, a simple patch seems to be set the "id" to zero in the call to
ipc_drop_locks in ripc_drop_locks.  That means we continue to call
up(&ids->sem) (which we need) but not ipc_unlock (which crashes us
immediately).
Patch attached,
File Added: ripc_drop_locks.patch

----------------------------------------------------------------------

Comment By: John Hughes (hughesj)
Date: 2008-04-17 07:18

Message:
Logged In: YES 
user_id=166336
Originator: YES

Well,turned out to be pretty easy to duplicate.  Wrote a silly test
program to create, attach and destroy a segment.

Node 1              node 2

shmget
smhmat
                    shmget
                    shmat
shmctl (IPC_RMID)

                    BUG!



----------------------------------------------------------------------

Comment By: John Hughes (hughesj)
Date: 2008-04-17 07:00

Message:
Logged In: YES 
user_id=166336
Originator: YES

Frankly I can't see how this is supposed to work.  In abort_rmid
(cluster/ssi/ipc/ipcshm_svr.c around line 418) we have:

        nl = NSC_NODELIST_COPY(svp->shm_nodelist);
        [...]
        while ((node = NSC_NODELIST_GET_NEXT(&cookie, nl))                
                                      
                                                != CLUSTERNODE_INVAL) {
               if (node == this_node)
                        continue;
                ret = RIPC_DROP_LOCKS(node, &rval, id);
        }

Which will call ipc_drop_locks on the remote node, but who has the lock? 
And why?

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=405834&aid=1944781&group_id=32541

[SSI-devel] [ ssic-linux-Bugs-1944781 ] unlocking unlocked lock in ripc_drop_locks

From: SourceForge.net <no...@so...> - 2008-04-19 19:25:47

Bugs item #1944781, was opened at 2008-04-17 06:23
Message generated for change (Comment added) made by rogertsang
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=405834&aid=1944781&group_id=32541

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: IPC
Group: v1.9.3
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: John Hughes (hughesj)
Assigned to: Nobody/Anonymous (nobody)
Summary: unlocking unlocked lock in ripc_drop_locks

Initial Comment:
------------[ cut here ]------------
kernel BUG at include/asm/spinlock.h:112!
invalid operand: 0000 [#1]
SMP
Modules linked in: i915 drm button ac battery parport_pc parport floppy pcspkr snd_intel8x0 snd_ac97_codec snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore snd_page_alloc i2c_i801 i2c_core ata_piix libata hw_random ehci_hcd uhci_hcd sr_mod sd_mod mptsas mptscsih mptbase scsi_mod tg3 e1000
CPU:    0
EIP:    0060:[<c046290b>]    Not tainted VLI
EFLAGS: 00010202   (2.6.11-jh-1)
EIP is at _spin_unlock+0x1b/0x30
eax: 00000001   ebx: c0750140   ecx: c0750101   edx: f7e12e08
esi: f70c2400   edi: c0753360   ebp: f7032f10   esp: f7032f10
ds: 007b   es: 007b   ss: 0068
Process icssvr_daemon (pid: 197135, threadinfo=f7032000 task=f70cd930)
Stack: f7032f18 c01cecbb f7032f28 c01ce77e f7e12e08 02668001 f7032f44 c0261dd5
       02668001 f7e12e08 c0750140 00000001 f7032f5c f7032f6c c0258708 00000003
       f7032f5c 02668001 00000000 00000000 02668001 00000002 00000002 f7032fec
Call Trace:
 [<c010694f>] show_stack+0x7f/0xa0
 [<c0106b04>] show_registers+0x164/0x220
 [<c0106e94>] die+0xf4/0x1c0
 [<c0107015>] do_trap+0xb5/0xc0
 [<c01072cc>] do_invalid_op+0xbc/0xd0
 [<c01065a3>] error_code+0x2b/0x30
 [<c01cecbb>] ipc_unlock+0xb/0x10
 [<c01ce77e>] ipc_drop_locks+0x1e/0x40
 [<c0261dd5>] ripc_drop_locks+0x45/0x60
 [<c0258708>] svr_ripc_drop_locks+0x58/0xb0
 [<c020abb3>] icssvr_daemon+0x2f3/0xab0
 [<c01023a5>] kernel_thread_helper+0x5/0x10
Code: 1c 0c 49 c0 eb e6 8d 76 00 8d bc 27 00 00 00 00 55 89 c2 89 e5 81 78 04 ad 4e ad de b1 01 75 15 0f b6 02 84 c0 7f 04 86 0a 5d c3 <0f> 0b 70 00 1c 0c 49 c0 eb f2 0f 0b 6f 00 1c 0c 49 c0 eb e1 90


----------------------------------------------------------------------

>Comment By: Roger Tsang (rogertsang)
Date: 2008-04-19 15:25

Message:
Logged In: YES 
user_id=1246761
Originator: NO

Updated patch for sys_shmget() client did not register with server.
File Added: ipc_shm.c.patch

----------------------------------------------------------------------

Comment By: Roger Tsang (rogertsang)
Date: 2008-04-19 03:13

Message:
Logged In: YES 
user_id=1246761
Originator: NO

Race at shm_destroy() going to the wrong svrnode. Try this patch.
File Added: ipc_shm.c.patch

----------------------------------------------------------------------

Comment By: John Hughes (hughesj)
Date: 2008-04-18 12:27

Message:
Logged In: YES 
user_id=166336
Originator: YES

Well, a simple patch seems to be set the "id" to zero in the call to
ipc_drop_locks in ripc_drop_locks.  That means we continue to call
up(&ids->sem) (which we need) but not ipc_unlock (which crashes us
immediately).
Patch attached,
File Added: ripc_drop_locks.patch

----------------------------------------------------------------------

Comment By: John Hughes (hughesj)
Date: 2008-04-17 07:18

Message:
Logged In: YES 
user_id=166336
Originator: YES

Well,turned out to be pretty easy to duplicate.  Wrote a silly test
program to create, attach and destroy a segment.

Node 1              node 2

shmget
smhmat
                    shmget
                    shmat
shmctl (IPC_RMID)

                    BUG!



----------------------------------------------------------------------

Comment By: John Hughes (hughesj)
Date: 2008-04-17 07:00

Message:
Logged In: YES 
user_id=166336
Originator: YES

Frankly I can't see how this is supposed to work.  In abort_rmid
(cluster/ssi/ipc/ipcshm_svr.c around line 418) we have:

        nl = NSC_NODELIST_COPY(svp->shm_nodelist);
        [...]
        while ((node = NSC_NODELIST_GET_NEXT(&cookie, nl))                
                                      
                                                != CLUSTERNODE_INVAL) {
               if (node == this_node)
                        continue;
                ret = RIPC_DROP_LOCKS(node, &rval, id);
        }

Which will call ipc_drop_locks on the remote node, but who has the lock? 
And why?

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=405834&aid=1944781&group_id=32541

[SSI-devel] [ ssic-linux-Bugs-1944781 ] unlocking unlocked lock in ripc_drop_locks

From: SourceForge.net <no...@so...> - 2008-04-22 12:12:09

Bugs item #1944781, was opened at 2008-04-17 12:23
Message generated for change (Comment added) made by hughesj
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=405834&aid=1944781&group_id=32541

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: IPC
Group: v1.9.3
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: John Hughes (hughesj)
Assigned to: Nobody/Anonymous (nobody)
Summary: unlocking unlocked lock in ripc_drop_locks

Initial Comment:
------------[ cut here ]------------
kernel BUG at include/asm/spinlock.h:112!
invalid operand: 0000 [#1]
SMP
Modules linked in: i915 drm button ac battery parport_pc parport floppy pcspkr snd_intel8x0 snd_ac97_codec snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore snd_page_alloc i2c_i801 i2c_core ata_piix libata hw_random ehci_hcd uhci_hcd sr_mod sd_mod mptsas mptscsih mptbase scsi_mod tg3 e1000
CPU:    0
EIP:    0060:[<c046290b>]    Not tainted VLI
EFLAGS: 00010202   (2.6.11-jh-1)
EIP is at _spin_unlock+0x1b/0x30
eax: 00000001   ebx: c0750140   ecx: c0750101   edx: f7e12e08
esi: f70c2400   edi: c0753360   ebp: f7032f10   esp: f7032f10
ds: 007b   es: 007b   ss: 0068
Process icssvr_daemon (pid: 197135, threadinfo=f7032000 task=f70cd930)
Stack: f7032f18 c01cecbb f7032f28 c01ce77e f7e12e08 02668001 f7032f44 c0261dd5
       02668001 f7e12e08 c0750140 00000001 f7032f5c f7032f6c c0258708 00000003
       f7032f5c 02668001 00000000 00000000 02668001 00000002 00000002 f7032fec
Call Trace:
 [<c010694f>] show_stack+0x7f/0xa0
 [<c0106b04>] show_registers+0x164/0x220
 [<c0106e94>] die+0xf4/0x1c0
 [<c0107015>] do_trap+0xb5/0xc0
 [<c01072cc>] do_invalid_op+0xbc/0xd0
 [<c01065a3>] error_code+0x2b/0x30
 [<c01cecbb>] ipc_unlock+0xb/0x10
 [<c01ce77e>] ipc_drop_locks+0x1e/0x40
 [<c0261dd5>] ripc_drop_locks+0x45/0x60
 [<c0258708>] svr_ripc_drop_locks+0x58/0xb0
 [<c020abb3>] icssvr_daemon+0x2f3/0xab0
 [<c01023a5>] kernel_thread_helper+0x5/0x10
Code: 1c 0c 49 c0 eb e6 8d 76 00 8d bc 27 00 00 00 00 55 89 c2 89 e5 81 78 04 ad 4e ad de b1 01 75 15 0f b6 02 84 c0 7f 04 86 0a 5d c3 <0f> 0b 70 00 1c 0c 49 c0 eb f2 0f 0b 6f 00 1c 0c 49 c0 eb e1 90


----------------------------------------------------------------------

>Comment By: John Hughes (hughesj)
Date: 2008-04-22 14:12

Message:
Logged In: YES 
user_id=166336
Originator: YES

I've just tested with current CVS (2008-4-22) which, as far as I can tell
includes the ipc_shm.c.patch, IPC_SHM_DESTROY_FIX is defined in linux.h and
I get the same crash.

To duplicate the problem, on a two node system:

onnode 1 ./shmcrash 123 100 & 
sleep 1
onnode 2 ./shmcrash 123 120 

Node 2 will crash in ipc_unlock when the shmcrash process on node 1
destroys the shared memory segment.

Source for shmcrash program attached.

File Added: shmcrash.c

----------------------------------------------------------------------

Comment By: Roger Tsang (rogertsang)
Date: 2008-04-19 21:25

Message:
Logged In: YES 
user_id=1246761
Originator: NO

Updated patch for sys_shmget() client did not register with server.
File Added: ipc_shm.c.patch

----------------------------------------------------------------------

Comment By: Roger Tsang (rogertsang)
Date: 2008-04-19 09:13

Message:
Logged In: YES 
user_id=1246761
Originator: NO

Race at shm_destroy() going to the wrong svrnode. Try this patch.
File Added: ipc_shm.c.patch

----------------------------------------------------------------------

Comment By: John Hughes (hughesj)
Date: 2008-04-18 18:27

Message:
Logged In: YES 
user_id=166336
Originator: YES

Well, a simple patch seems to be set the "id" to zero in the call to
ipc_drop_locks in ripc_drop_locks.  That means we continue to call
up(&ids->sem) (which we need) but not ipc_unlock (which crashes us
immediately).
Patch attached,
File Added: ripc_drop_locks.patch

----------------------------------------------------------------------

Comment By: John Hughes (hughesj)
Date: 2008-04-17 13:18

Message:
Logged In: YES 
user_id=166336
Originator: YES

Well,turned out to be pretty easy to duplicate.  Wrote a silly test
program to create, attach and destroy a segment.

Node 1              node 2

shmget
smhmat
                    shmget
                    shmat
shmctl (IPC_RMID)

                    BUG!



----------------------------------------------------------------------

Comment By: John Hughes (hughesj)
Date: 2008-04-17 13:00

Message:
Logged In: YES 
user_id=166336
Originator: YES

Frankly I can't see how this is supposed to work.  In abort_rmid
(cluster/ssi/ipc/ipcshm_svr.c around line 418) we have:

        nl = NSC_NODELIST_COPY(svp->shm_nodelist);
        [...]
        while ((node = NSC_NODELIST_GET_NEXT(&cookie, nl))                
                                      
                                                != CLUSTERNODE_INVAL) {
               if (node == this_node)
                        continue;
                ret = RIPC_DROP_LOCKS(node, &rval, id);
        }

Which will call ipc_drop_locks on the remote node, but who has the lock? 
And why?

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=405834&aid=1944781&group_id=32541

[SSI-devel] [ ssic-linux-Bugs-1944781 ] unlocking unlocked lock in ripc_drop_locks

From: SourceForge.net <no...@so...> - 2008-04-23 07:16:05

Bugs item #1944781, was opened at 2008-04-17 06:23
Message generated for change (Comment added) made by rogertsang
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=405834&aid=1944781&group_id=32541

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: IPC
Group: v1.9.3
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: John Hughes (hughesj)
Assigned to: Nobody/Anonymous (nobody)
Summary: unlocking unlocked lock in ripc_drop_locks

Initial Comment:
------------[ cut here ]------------
kernel BUG at include/asm/spinlock.h:112!
invalid operand: 0000 [#1]
SMP
Modules linked in: i915 drm button ac battery parport_pc parport floppy pcspkr snd_intel8x0 snd_ac97_codec snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore snd_page_alloc i2c_i801 i2c_core ata_piix libata hw_random ehci_hcd uhci_hcd sr_mod sd_mod mptsas mptscsih mptbase scsi_mod tg3 e1000
CPU:    0
EIP:    0060:[<c046290b>]    Not tainted VLI
EFLAGS: 00010202   (2.6.11-jh-1)
EIP is at _spin_unlock+0x1b/0x30
eax: 00000001   ebx: c0750140   ecx: c0750101   edx: f7e12e08
esi: f70c2400   edi: c0753360   ebp: f7032f10   esp: f7032f10
ds: 007b   es: 007b   ss: 0068
Process icssvr_daemon (pid: 197135, threadinfo=f7032000 task=f70cd930)
Stack: f7032f18 c01cecbb f7032f28 c01ce77e f7e12e08 02668001 f7032f44 c0261dd5
       02668001 f7e12e08 c0750140 00000001 f7032f5c f7032f6c c0258708 00000003
       f7032f5c 02668001 00000000 00000000 02668001 00000002 00000002 f7032fec
Call Trace:
 [<c010694f>] show_stack+0x7f/0xa0
 [<c0106b04>] show_registers+0x164/0x220
 [<c0106e94>] die+0xf4/0x1c0
 [<c0107015>] do_trap+0xb5/0xc0
 [<c01072cc>] do_invalid_op+0xbc/0xd0
 [<c01065a3>] error_code+0x2b/0x30
 [<c01cecbb>] ipc_unlock+0xb/0x10
 [<c01ce77e>] ipc_drop_locks+0x1e/0x40
 [<c0261dd5>] ripc_drop_locks+0x45/0x60
 [<c0258708>] svr_ripc_drop_locks+0x58/0xb0
 [<c020abb3>] icssvr_daemon+0x2f3/0xab0
 [<c01023a5>] kernel_thread_helper+0x5/0x10
Code: 1c 0c 49 c0 eb e6 8d 76 00 8d bc 27 00 00 00 00 55 89 c2 89 e5 81 78 04 ad 4e ad de b1 01 75 15 0f b6 02 84 c0 7f 04 86 0a 5d c3 <0f> 0b 70 00 1c 0c 49 c0 eb f2 0f 0b 6f 00 1c 0c 49 c0 eb e1 90


----------------------------------------------------------------------

>Comment By: Roger Tsang (rogertsang)
Date: 2008-04-23 03:16

Message:
Logged In: YES 
user_id=1246761
Originator: NO

Taking a closer look at shm_destroy() and looks like ssi_shm_cleanup() did
not handle -ERMID.  Kinda late night, so haven't tested either your
shmcrash or this patch but it makes sense.
File Added: IPC_SHM_DESTROY_FIX.patch

----------------------------------------------------------------------

Comment By: John Hughes (hughesj)
Date: 2008-04-22 08:12

Message:
Logged In: YES 
user_id=166336
Originator: YES

I've just tested with current CVS (2008-4-22) which, as far as I can tell
includes the ipc_shm.c.patch, IPC_SHM_DESTROY_FIX is defined in linux.h and
I get the same crash.

To duplicate the problem, on a two node system:

onnode 1 ./shmcrash 123 100 & 
sleep 1
onnode 2 ./shmcrash 123 120 

Node 2 will crash in ipc_unlock when the shmcrash process on node 1
destroys the shared memory segment.

Source for shmcrash program attached.

File Added: shmcrash.c

----------------------------------------------------------------------

Comment By: Roger Tsang (rogertsang)
Date: 2008-04-19 15:25

Message:
Logged In: YES 
user_id=1246761
Originator: NO

Updated patch for sys_shmget() client did not register with server.
File Added: ipc_shm.c.patch

----------------------------------------------------------------------

Comment By: Roger Tsang (rogertsang)
Date: 2008-04-19 03:13

Message:
Logged In: YES 
user_id=1246761
Originator: NO

Race at shm_destroy() going to the wrong svrnode. Try this patch.
File Added: ipc_shm.c.patch

----------------------------------------------------------------------

Comment By: John Hughes (hughesj)
Date: 2008-04-18 12:27

Message:
Logged In: YES 
user_id=166336
Originator: YES

Well, a simple patch seems to be set the "id" to zero in the call to
ipc_drop_locks in ripc_drop_locks.  That means we continue to call
up(&ids->sem) (which we need) but not ipc_unlock (which crashes us
immediately).
Patch attached,
File Added: ripc_drop_locks.patch

----------------------------------------------------------------------

Comment By: John Hughes (hughesj)
Date: 2008-04-17 07:18

Message:
Logged In: YES 
user_id=166336
Originator: YES

Well,turned out to be pretty easy to duplicate.  Wrote a silly test
program to create, attach and destroy a segment.

Node 1              node 2

shmget
smhmat
                    shmget
                    shmat
shmctl (IPC_RMID)

                    BUG!



----------------------------------------------------------------------

Comment By: John Hughes (hughesj)
Date: 2008-04-17 07:00

Message:
Logged In: YES 
user_id=166336
Originator: YES

Frankly I can't see how this is supposed to work.  In abort_rmid
(cluster/ssi/ipc/ipcshm_svr.c around line 418) we have:

        nl = NSC_NODELIST_COPY(svp->shm_nodelist);
        [...]
        while ((node = NSC_NODELIST_GET_NEXT(&cookie, nl))                
                                      
                                                != CLUSTERNODE_INVAL) {
               if (node == this_node)
                        continue;
                ret = RIPC_DROP_LOCKS(node, &rval, id);
        }

Which will call ipc_drop_locks on the remote node, but who has the lock? 
And why?

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=405834&aid=1944781&group_id=32541

[SSI-devel] [ ssic-linux-Bugs-1944781 ] unlocking unlocked lock in ripc_drop_locks

From: SourceForge.net <no...@so...> - 2008-05-23 23:44:17

Bugs item #1944781, was opened at 2008-04-17 06:23
Message generated for change (Settings changed) made by rogertsang
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=405834&aid=1944781&group_id=32541

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: IPC
Group: v1.9.3
Status: Open
>Resolution: Accepted
Priority: 5
Private: No
Submitted By: John Hughes (hughesj)
Assigned to: Nobody/Anonymous (nobody)
Summary: unlocking unlocked lock in ripc_drop_locks

Initial Comment:
------------[ cut here ]------------
kernel BUG at include/asm/spinlock.h:112!
invalid operand: 0000 [#1]
SMP
Modules linked in: i915 drm button ac battery parport_pc parport floppy pcspkr snd_intel8x0 snd_ac97_codec snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore snd_page_alloc i2c_i801 i2c_core ata_piix libata hw_random ehci_hcd uhci_hcd sr_mod sd_mod mptsas mptscsih mptbase scsi_mod tg3 e1000
CPU:    0
EIP:    0060:[<c046290b>]    Not tainted VLI
EFLAGS: 00010202   (2.6.11-jh-1)
EIP is at _spin_unlock+0x1b/0x30
eax: 00000001   ebx: c0750140   ecx: c0750101   edx: f7e12e08
esi: f70c2400   edi: c0753360   ebp: f7032f10   esp: f7032f10
ds: 007b   es: 007b   ss: 0068
Process icssvr_daemon (pid: 197135, threadinfo=f7032000 task=f70cd930)
Stack: f7032f18 c01cecbb f7032f28 c01ce77e f7e12e08 02668001 f7032f44 c0261dd5
       02668001 f7e12e08 c0750140 00000001 f7032f5c f7032f6c c0258708 00000003
       f7032f5c 02668001 00000000 00000000 02668001 00000002 00000002 f7032fec
Call Trace:
 [<c010694f>] show_stack+0x7f/0xa0
 [<c0106b04>] show_registers+0x164/0x220
 [<c0106e94>] die+0xf4/0x1c0
 [<c0107015>] do_trap+0xb5/0xc0
 [<c01072cc>] do_invalid_op+0xbc/0xd0
 [<c01065a3>] error_code+0x2b/0x30
 [<c01cecbb>] ipc_unlock+0xb/0x10
 [<c01ce77e>] ipc_drop_locks+0x1e/0x40
 [<c0261dd5>] ripc_drop_locks+0x45/0x60
 [<c0258708>] svr_ripc_drop_locks+0x58/0xb0
 [<c020abb3>] icssvr_daemon+0x2f3/0xab0
 [<c01023a5>] kernel_thread_helper+0x5/0x10
Code: 1c 0c 49 c0 eb e6 8d 76 00 8d bc 27 00 00 00 00 55 89 c2 89 e5 81 78 04 ad 4e ad de b1 01 75 15 0f b6 02 84 c0 7f 04 86 0a 5d c3 <0f> 0b 70 00 1c 0c 49 c0 eb f2 0f 0b 6f 00 1c 0c 49 c0 eb e1 90


----------------------------------------------------------------------

Comment By: Roger Tsang (rogertsang)
Date: 2008-04-23 03:16

Message:
Logged In: YES 
user_id=1246761
Originator: NO

Taking a closer look at shm_destroy() and looks like ssi_shm_cleanup() did
not handle -ERMID.  Kinda late night, so haven't tested either your
shmcrash or this patch but it makes sense.
File Added: IPC_SHM_DESTROY_FIX.patch

----------------------------------------------------------------------

Comment By: John Hughes (hughesj)
Date: 2008-04-22 08:12

Message:
Logged In: YES 
user_id=166336
Originator: YES

I've just tested with current CVS (2008-4-22) which, as far as I can tell
includes the ipc_shm.c.patch, IPC_SHM_DESTROY_FIX is defined in linux.h and
I get the same crash.

To duplicate the problem, on a two node system:

onnode 1 ./shmcrash 123 100 & 
sleep 1
onnode 2 ./shmcrash 123 120 

Node 2 will crash in ipc_unlock when the shmcrash process on node 1
destroys the shared memory segment.

Source for shmcrash program attached.

File Added: shmcrash.c

----------------------------------------------------------------------

Comment By: Roger Tsang (rogertsang)
Date: 2008-04-19 15:25

Message:
Logged In: YES 
user_id=1246761
Originator: NO

Updated patch for sys_shmget() client did not register with server.
File Added: ipc_shm.c.patch

----------------------------------------------------------------------

Comment By: Roger Tsang (rogertsang)
Date: 2008-04-19 03:13

Message:
Logged In: YES 
user_id=1246761
Originator: NO

Race at shm_destroy() going to the wrong svrnode. Try this patch.
File Added: ipc_shm.c.patch

----------------------------------------------------------------------

Comment By: John Hughes (hughesj)
Date: 2008-04-18 12:27

Message:
Logged In: YES 
user_id=166336
Originator: YES

Well, a simple patch seems to be set the "id" to zero in the call to
ipc_drop_locks in ripc_drop_locks.  That means we continue to call
up(&ids->sem) (which we need) but not ipc_unlock (which crashes us
immediately).
Patch attached,
File Added: ripc_drop_locks.patch

----------------------------------------------------------------------

Comment By: John Hughes (hughesj)
Date: 2008-04-17 07:18

Message:
Logged In: YES 
user_id=166336
Originator: YES

Well,turned out to be pretty easy to duplicate.  Wrote a silly test
program to create, attach and destroy a segment.

Node 1              node 2

shmget
smhmat
                    shmget
                    shmat
shmctl (IPC_RMID)

                    BUG!



----------------------------------------------------------------------

Comment By: John Hughes (hughesj)
Date: 2008-04-17 07:00

Message:
Logged In: YES 
user_id=166336
Originator: YES

Frankly I can't see how this is supposed to work.  In abort_rmid
(cluster/ssi/ipc/ipcshm_svr.c around line 418) we have:

        nl = NSC_NODELIST_COPY(svp->shm_nodelist);
        [...]
        while ((node = NSC_NODELIST_GET_NEXT(&cookie, nl))                
                                      
                                                != CLUSTERNODE_INVAL) {
               if (node == this_node)
                        continue;
                ret = RIPC_DROP_LOCKS(node, &rval, id);
        }

Which will call ipc_drop_locks on the remote node, but who has the lock? 
And why?

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=405834&aid=1944781&group_id=32541

[SSI-devel] [ ssic-linux-Bugs-1944781 ] unlocking unlocked lock in ripc_drop_locks

From: SourceForge.net <no...@so...> - 2008-05-24 00:21:13

Bugs item #1944781, was opened at 2008-04-17 06:23
Message generated for change (Comment added) made by rogertsang
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=405834&aid=1944781&group_id=32541

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: IPC
Group: v1.9.3
>Status: Closed
>Resolution: Fixed
Priority: 5
Private: No
Submitted By: John Hughes (hughesj)
Assigned to: Nobody/Anonymous (nobody)
Summary: unlocking unlocked lock in ripc_drop_locks

Initial Comment:
------------[ cut here ]------------
kernel BUG at include/asm/spinlock.h:112!
invalid operand: 0000 [#1]
SMP
Modules linked in: i915 drm button ac battery parport_pc parport floppy pcspkr snd_intel8x0 snd_ac97_codec snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore snd_page_alloc i2c_i801 i2c_core ata_piix libata hw_random ehci_hcd uhci_hcd sr_mod sd_mod mptsas mptscsih mptbase scsi_mod tg3 e1000
CPU:    0
EIP:    0060:[<c046290b>]    Not tainted VLI
EFLAGS: 00010202   (2.6.11-jh-1)
EIP is at _spin_unlock+0x1b/0x30
eax: 00000001   ebx: c0750140   ecx: c0750101   edx: f7e12e08
esi: f70c2400   edi: c0753360   ebp: f7032f10   esp: f7032f10
ds: 007b   es: 007b   ss: 0068
Process icssvr_daemon (pid: 197135, threadinfo=f7032000 task=f70cd930)
Stack: f7032f18 c01cecbb f7032f28 c01ce77e f7e12e08 02668001 f7032f44 c0261dd5
       02668001 f7e12e08 c0750140 00000001 f7032f5c f7032f6c c0258708 00000003
       f7032f5c 02668001 00000000 00000000 02668001 00000002 00000002 f7032fec
Call Trace:
 [<c010694f>] show_stack+0x7f/0xa0
 [<c0106b04>] show_registers+0x164/0x220
 [<c0106e94>] die+0xf4/0x1c0
 [<c0107015>] do_trap+0xb5/0xc0
 [<c01072cc>] do_invalid_op+0xbc/0xd0
 [<c01065a3>] error_code+0x2b/0x30
 [<c01cecbb>] ipc_unlock+0xb/0x10
 [<c01ce77e>] ipc_drop_locks+0x1e/0x40
 [<c0261dd5>] ripc_drop_locks+0x45/0x60
 [<c0258708>] svr_ripc_drop_locks+0x58/0xb0
 [<c020abb3>] icssvr_daemon+0x2f3/0xab0
 [<c01023a5>] kernel_thread_helper+0x5/0x10
Code: 1c 0c 49 c0 eb e6 8d 76 00 8d bc 27 00 00 00 00 55 89 c2 89 e5 81 78 04 ad 4e ad de b1 01 75 15 0f b6 02 84 c0 7f 04 86 0a 5d c3 <0f> 0b 70 00 1c 0c 49 c0 eb f2 0f 0b 6f 00 1c 0c 49 c0 eb e1 90


----------------------------------------------------------------------

>Comment By: Roger Tsang (rogertsang)
Date: 2008-05-23 20:21

Message:
Logged In: YES 
user_id=1246761
Originator: NO

Latest patch in CVS.

----------------------------------------------------------------------

Comment By: Roger Tsang (rogertsang)
Date: 2008-04-23 03:16

Message:
Logged In: YES 
user_id=1246761
Originator: NO

Taking a closer look at shm_destroy() and looks like ssi_shm_cleanup() did
not handle -ERMID.  Kinda late night, so haven't tested either your
shmcrash or this patch but it makes sense.
File Added: IPC_SHM_DESTROY_FIX.patch

----------------------------------------------------------------------

Comment By: John Hughes (hughesj)
Date: 2008-04-22 08:12

Message:
Logged In: YES 
user_id=166336
Originator: YES

I've just tested with current CVS (2008-4-22) which, as far as I can tell
includes the ipc_shm.c.patch, IPC_SHM_DESTROY_FIX is defined in linux.h and
I get the same crash.

To duplicate the problem, on a two node system:

onnode 1 ./shmcrash 123 100 & 
sleep 1
onnode 2 ./shmcrash 123 120 

Node 2 will crash in ipc_unlock when the shmcrash process on node 1
destroys the shared memory segment.

Source for shmcrash program attached.

File Added: shmcrash.c

----------------------------------------------------------------------

Comment By: Roger Tsang (rogertsang)
Date: 2008-04-19 15:25

Message:
Logged In: YES 
user_id=1246761
Originator: NO

Updated patch for sys_shmget() client did not register with server.
File Added: ipc_shm.c.patch

----------------------------------------------------------------------

Comment By: Roger Tsang (rogertsang)
Date: 2008-04-19 03:13

Message:
Logged In: YES 
user_id=1246761
Originator: NO

Race at shm_destroy() going to the wrong svrnode. Try this patch.
File Added: ipc_shm.c.patch

----------------------------------------------------------------------

Comment By: John Hughes (hughesj)
Date: 2008-04-18 12:27

Message:
Logged In: YES 
user_id=166336
Originator: YES

Well, a simple patch seems to be set the "id" to zero in the call to
ipc_drop_locks in ripc_drop_locks.  That means we continue to call
up(&ids->sem) (which we need) but not ipc_unlock (which crashes us
immediately).
Patch attached,
File Added: ripc_drop_locks.patch

----------------------------------------------------------------------

Comment By: John Hughes (hughesj)
Date: 2008-04-17 07:18

Message:
Logged In: YES 
user_id=166336
Originator: YES

Well,turned out to be pretty easy to duplicate.  Wrote a silly test
program to create, attach and destroy a segment.

Node 1              node 2

shmget
smhmat
                    shmget
                    shmat
shmctl (IPC_RMID)

                    BUG!



----------------------------------------------------------------------

Comment By: John Hughes (hughesj)
Date: 2008-04-17 07:00

Message:
Logged In: YES 
user_id=166336
Originator: YES

Frankly I can't see how this is supposed to work.  In abort_rmid
(cluster/ssi/ipc/ipcshm_svr.c around line 418) we have:

        nl = NSC_NODELIST_COPY(svp->shm_nodelist);
        [...]
        while ((node = NSC_NODELIST_GET_NEXT(&cookie, nl))                
                                      
                                                != CLUSTERNODE_INVAL) {
               if (node == this_node)
                        continue;
                ret = RIPC_DROP_LOCKS(node, &rval, id);
        }

Which will call ipc_drop_locks on the remote node, but who has the lock? 
And why?

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=405834&aid=1944781&group_id=32541

[SSI-devel] [ ssic-linux-Bugs-1944781 ] unlocking unlocked lock in ripc_drop_locks

From: SourceForge.net <no...@so...> - 2008-06-08 08:51:50

Bugs item #1944781, was opened at 2008-04-17 12:23
Message generated for change (Settings changed) made by hughesj
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=405834&aid=1944781&group_id=32541

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: IPC
Group: v1.9.3
>Status: Open
Resolution: Fixed
Priority: 5
Private: No
Submitted By: John Hughes (hughesj)
Assigned to: Nobody/Anonymous (nobody)
Summary: unlocking unlocked lock in ripc_drop_locks

Initial Comment:
------------[ cut here ]------------
kernel BUG at include/asm/spinlock.h:112!
invalid operand: 0000 [#1]
SMP
Modules linked in: i915 drm button ac battery parport_pc parport floppy pcspkr snd_intel8x0 snd_ac97_codec snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore snd_page_alloc i2c_i801 i2c_core ata_piix libata hw_random ehci_hcd uhci_hcd sr_mod sd_mod mptsas mptscsih mptbase scsi_mod tg3 e1000
CPU:    0
EIP:    0060:[<c046290b>]    Not tainted VLI
EFLAGS: 00010202   (2.6.11-jh-1)
EIP is at _spin_unlock+0x1b/0x30
eax: 00000001   ebx: c0750140   ecx: c0750101   edx: f7e12e08
esi: f70c2400   edi: c0753360   ebp: f7032f10   esp: f7032f10
ds: 007b   es: 007b   ss: 0068
Process icssvr_daemon (pid: 197135, threadinfo=f7032000 task=f70cd930)
Stack: f7032f18 c01cecbb f7032f28 c01ce77e f7e12e08 02668001 f7032f44 c0261dd5
       02668001 f7e12e08 c0750140 00000001 f7032f5c f7032f6c c0258708 00000003
       f7032f5c 02668001 00000000 00000000 02668001 00000002 00000002 f7032fec
Call Trace:
 [<c010694f>] show_stack+0x7f/0xa0
 [<c0106b04>] show_registers+0x164/0x220
 [<c0106e94>] die+0xf4/0x1c0
 [<c0107015>] do_trap+0xb5/0xc0
 [<c01072cc>] do_invalid_op+0xbc/0xd0
 [<c01065a3>] error_code+0x2b/0x30
 [<c01cecbb>] ipc_unlock+0xb/0x10
 [<c01ce77e>] ipc_drop_locks+0x1e/0x40
 [<c0261dd5>] ripc_drop_locks+0x45/0x60
 [<c0258708>] svr_ripc_drop_locks+0x58/0xb0
 [<c020abb3>] icssvr_daemon+0x2f3/0xab0
 [<c01023a5>] kernel_thread_helper+0x5/0x10
Code: 1c 0c 49 c0 eb e6 8d 76 00 8d bc 27 00 00 00 00 55 89 c2 89 e5 81 78 04 ad 4e ad de b1 01 75 15 0f b6 02 84 c0 7f 04 86 0a 5d c3 <0f> 0b 70 00 1c 0c 49 c0 eb f2 0f 0b 6f 00 1c 0c 49 c0 eb e1 90


----------------------------------------------------------------------

>Comment By: John Hughes (hughesj)
Date: 2008-06-08 10:51

Message:
Logged In: YES 
user_id=166336
Originator: YES

Using a kernel build from CVS as of 2008-06-07 I see the same problem -
crashes in ipc_unlock called from ripc_drop locks.

Here's my analysis of the problem:

1. There is a rpc "RIPC_DROP_LOCKS(node, rval, id)" which calls
ripc_drop_locks on another node.

2. ripc_drop_locks, which is running in the context of some rpc server on
some other node than the process that called RIPC_DROP_LOCKS does:

        struct shmid_kernel *shp = shm_cli_get(id);
        if (shp)
                shp->shm_perm.mode &= ~SHM_LOCK_DEST;
        ipc_drop_locks(id, (struct kern_ipc_perm *)shp, &shm_ids, 1);

(id is obviously nonzero here or shm_cli_get would do nothing
interesting)

ipc_drop_locks does:

        if (id > 0)
                ipc_unlock(perm);
        if (table)
                up(&ids->sem);

(id is nonzero so we must call ipc_unlock)

and ipc_unlock does:

        spin_unlock(&perm->lock);
        rcu_read_unlock();

So ipc_unlock expects that the calling process will have the spinlock, but
ripc_unlock_locks *CANNOT* have the spinlock - it is operating in the
context of the rpc server, and spinlocks can't be held across sleeps, and
the rpc server obviously sleeps between rpc's.

My hacky fix is to set id to zero in the call to ipc_drop_locks.  An
equivalent fix would to be to replace the call to ipc_drop_locks by
directly calling  up(&ids->sem);


----------------------------------------------------------------------

Comment By: Roger Tsang (rogertsang)
Date: 2008-05-24 02:21

Message:
Logged In: YES 
user_id=1246761
Originator: NO

Latest patch in CVS.

----------------------------------------------------------------------

Comment By: Roger Tsang (rogertsang)
Date: 2008-04-23 09:16

Message:
Logged In: YES 
user_id=1246761
Originator: NO

Taking a closer look at shm_destroy() and looks like ssi_shm_cleanup() did
not handle -ERMID.  Kinda late night, so haven't tested either your
shmcrash or this patch but it makes sense.
File Added: IPC_SHM_DESTROY_FIX.patch

----------------------------------------------------------------------

Comment By: John Hughes (hughesj)
Date: 2008-04-22 14:12

Message:
Logged In: YES 
user_id=166336
Originator: YES

I've just tested with current CVS (2008-4-22) which, as far as I can tell
includes the ipc_shm.c.patch, IPC_SHM_DESTROY_FIX is defined in linux.h and
I get the same crash.

To duplicate the problem, on a two node system:

onnode 1 ./shmcrash 123 100 & 
sleep 1
onnode 2 ./shmcrash 123 120 

Node 2 will crash in ipc_unlock when the shmcrash process on node 1
destroys the shared memory segment.

Source for shmcrash program attached.

File Added: shmcrash.c

----------------------------------------------------------------------

Comment By: Roger Tsang (rogertsang)
Date: 2008-04-19 21:25

Message:
Logged In: YES 
user_id=1246761
Originator: NO

Updated patch for sys_shmget() client did not register with server.
File Added: ipc_shm.c.patch

----------------------------------------------------------------------

Comment By: Roger Tsang (rogertsang)
Date: 2008-04-19 09:13

Message:
Logged In: YES 
user_id=1246761
Originator: NO

Race at shm_destroy() going to the wrong svrnode. Try this patch.
File Added: ipc_shm.c.patch

----------------------------------------------------------------------

Comment By: John Hughes (hughesj)
Date: 2008-04-18 18:27

Message:
Logged In: YES 
user_id=166336
Originator: YES

Well, a simple patch seems to be set the "id" to zero in the call to
ipc_drop_locks in ripc_drop_locks.  That means we continue to call
up(&ids->sem) (which we need) but not ipc_unlock (which crashes us
immediately).
Patch attached,
File Added: ripc_drop_locks.patch

----------------------------------------------------------------------

Comment By: John Hughes (hughesj)
Date: 2008-04-17 13:18

Message:
Logged In: YES 
user_id=166336
Originator: YES

Well,turned out to be pretty easy to duplicate.  Wrote a silly test
program to create, attach and destroy a segment.

Node 1              node 2

shmget
smhmat
                    shmget
                    shmat
shmctl (IPC_RMID)

                    BUG!



----------------------------------------------------------------------

Comment By: John Hughes (hughesj)
Date: 2008-04-17 13:00

Message:
Logged In: YES 
user_id=166336
Originator: YES

Frankly I can't see how this is supposed to work.  In abort_rmid
(cluster/ssi/ipc/ipcshm_svr.c around line 418) we have:

        nl = NSC_NODELIST_COPY(svp->shm_nodelist);
        [...]
        while ((node = NSC_NODELIST_GET_NEXT(&cookie, nl))                
                                      
                                                != CLUSTERNODE_INVAL) {
               if (node == this_node)
                        continue;
                ret = RIPC_DROP_LOCKS(node, &rval, id);
        }

Which will call ipc_drop_locks on the remote node, but who has the lock? 
And why?

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=405834&aid=1944781&group_id=32541

[SSI-devel] [ ssic-linux-Bugs-1944781 ] unlocking unlocked lock in ripc_drop_locks

From: SourceForge.net <no...@so...> - 2008-10-20 12:33:41

Bugs item #1944781, was opened at 2008-04-17 12:23
Message generated for change (Comment added) made by hughesj
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=405834&aid=1944781&group_id=32541

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: IPC
Group: v1.9.3
Status: Open
Resolution: Fixed
Priority: 5
Private: No
Submitted By: John Hughes (hughesj)
Assigned to: Nobody/Anonymous (nobody)
Summary: unlocking unlocked lock in ripc_drop_locks

Initial Comment:
------------[ cut here ]------------
kernel BUG at include/asm/spinlock.h:112!
invalid operand: 0000 [#1]
SMP
Modules linked in: i915 drm button ac battery parport_pc parport floppy pcspkr snd_intel8x0 snd_ac97_codec snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore snd_page_alloc i2c_i801 i2c_core ata_piix libata hw_random ehci_hcd uhci_hcd sr_mod sd_mod mptsas mptscsih mptbase scsi_mod tg3 e1000
CPU:    0
EIP:    0060:[<c046290b>]    Not tainted VLI
EFLAGS: 00010202   (2.6.11-jh-1)
EIP is at _spin_unlock+0x1b/0x30
eax: 00000001   ebx: c0750140   ecx: c0750101   edx: f7e12e08
esi: f70c2400   edi: c0753360   ebp: f7032f10   esp: f7032f10
ds: 007b   es: 007b   ss: 0068
Process icssvr_daemon (pid: 197135, threadinfo=f7032000 task=f70cd930)
Stack: f7032f18 c01cecbb f7032f28 c01ce77e f7e12e08 02668001 f7032f44 c0261dd5
       02668001 f7e12e08 c0750140 00000001 f7032f5c f7032f6c c0258708 00000003
       f7032f5c 02668001 00000000 00000000 02668001 00000002 00000002 f7032fec
Call Trace:
 [<c010694f>] show_stack+0x7f/0xa0
 [<c0106b04>] show_registers+0x164/0x220
 [<c0106e94>] die+0xf4/0x1c0
 [<c0107015>] do_trap+0xb5/0xc0
 [<c01072cc>] do_invalid_op+0xbc/0xd0
 [<c01065a3>] error_code+0x2b/0x30
 [<c01cecbb>] ipc_unlock+0xb/0x10
 [<c01ce77e>] ipc_drop_locks+0x1e/0x40
 [<c0261dd5>] ripc_drop_locks+0x45/0x60
 [<c0258708>] svr_ripc_drop_locks+0x58/0xb0
 [<c020abb3>] icssvr_daemon+0x2f3/0xab0
 [<c01023a5>] kernel_thread_helper+0x5/0x10
Code: 1c 0c 49 c0 eb e6 8d 76 00 8d bc 27 00 00 00 00 55 89 c2 89 e5 81 78 04 ad 4e ad de b1 01 75 15 0f b6 02 84 c0 7f 04 86 0a 5d c3 <0f> 0b 70 00 1c 0c 49 c0 eb f2 0f 0b 6f 00 1c 0c 49 c0 eb e1 90


----------------------------------------------------------------------

Comment By: John Hughes (hughesj)
Date: 2008-10-20 14:22

Message:
Fixed in latest CVS

----------------------------------------------------------------------

Comment By: John Hughes (hughesj)
Date: 2008-06-08 10:51

Message:
Logged In: YES 
user_id=166336
Originator: YES

Using a kernel build from CVS as of 2008-06-07 I see the same problem -
crashes in ipc_unlock called from ripc_drop locks.

Here's my analysis of the problem:

1. There is a rpc "RIPC_DROP_LOCKS(node, rval, id)" which calls
ripc_drop_locks on another node.

2. ripc_drop_locks, which is running in the context of some rpc server on
some other node than the process that called RIPC_DROP_LOCKS does:

        struct shmid_kernel *shp = shm_cli_get(id);
        if (shp)
                shp->shm_perm.mode &= ~SHM_LOCK_DEST;
        ipc_drop_locks(id, (struct kern_ipc_perm *)shp, &shm_ids, 1);

(id is obviously nonzero here or shm_cli_get would do nothing
interesting)

ipc_drop_locks does:

        if (id > 0)
                ipc_unlock(perm);
        if (table)
                up(&ids->sem);

(id is nonzero so we must call ipc_unlock)

and ipc_unlock does:

        spin_unlock(&perm->lock);
        rcu_read_unlock();

So ipc_unlock expects that the calling process will have the spinlock, but
ripc_unlock_locks *CANNOT* have the spinlock - it is operating in the
context of the rpc server, and spinlocks can't be held across sleeps, and
the rpc server obviously sleeps between rpc's.

My hacky fix is to set id to zero in the call to ipc_drop_locks.  An
equivalent fix would to be to replace the call to ipc_drop_locks by
directly calling  up(&ids->sem);


----------------------------------------------------------------------

Comment By: Roger Tsang (rogertsang)
Date: 2008-05-24 02:21

Message:
Logged In: YES 
user_id=1246761
Originator: NO

Latest patch in CVS.

----------------------------------------------------------------------

Comment By: Roger Tsang (rogertsang)
Date: 2008-04-23 09:16

Message:
Logged In: YES 
user_id=1246761
Originator: NO

Taking a closer look at shm_destroy() and looks like ssi_shm_cleanup() did
not handle -ERMID.  Kinda late night, so haven't tested either your
shmcrash or this patch but it makes sense.
File Added: IPC_SHM_DESTROY_FIX.patch

----------------------------------------------------------------------

Comment By: John Hughes (hughesj)
Date: 2008-04-22 14:12

Message:
Logged In: YES 
user_id=166336
Originator: YES

I've just tested with current CVS (2008-4-22) which, as far as I can tell
includes the ipc_shm.c.patch, IPC_SHM_DESTROY_FIX is defined in linux.h and
I get the same crash.

To duplicate the problem, on a two node system:

onnode 1 ./shmcrash 123 100 & 
sleep 1
onnode 2 ./shmcrash 123 120 

Node 2 will crash in ipc_unlock when the shmcrash process on node 1
destroys the shared memory segment.

Source for shmcrash program attached.

File Added: shmcrash.c

----------------------------------------------------------------------

Comment By: Roger Tsang (rogertsang)
Date: 2008-04-19 21:25

Message:
Logged In: YES 
user_id=1246761
Originator: NO

Updated patch for sys_shmget() client did not register with server.
File Added: ipc_shm.c.patch

----------------------------------------------------------------------

Comment By: Roger Tsang (rogertsang)
Date: 2008-04-19 09:13

Message:
Logged In: YES 
user_id=1246761
Originator: NO

Race at shm_destroy() going to the wrong svrnode. Try this patch.
File Added: ipc_shm.c.patch

----------------------------------------------------------------------

Comment By: John Hughes (hughesj)
Date: 2008-04-18 18:27

Message:
Logged In: YES 
user_id=166336
Originator: YES

Well, a simple patch seems to be set the "id" to zero in the call to
ipc_drop_locks in ripc_drop_locks.  That means we continue to call
up(&ids->sem) (which we need) but not ipc_unlock (which crashes us
immediately).
Patch attached,
File Added: ripc_drop_locks.patch

----------------------------------------------------------------------

Comment By: John Hughes (hughesj)
Date: 2008-04-17 13:18

Message:
Logged In: YES 
user_id=166336
Originator: YES

Well,turned out to be pretty easy to duplicate.  Wrote a silly test
program to create, attach and destroy a segment.

Node 1              node 2

shmget
smhmat
                    shmget
                    shmat
shmctl (IPC_RMID)

                    BUG!



----------------------------------------------------------------------

Comment By: John Hughes (hughesj)
Date: 2008-04-17 13:00

Message:
Logged In: YES 
user_id=166336
Originator: YES

Frankly I can't see how this is supposed to work.  In abort_rmid
(cluster/ssi/ipc/ipcshm_svr.c around line 418) we have:

        nl = NSC_NODELIST_COPY(svp->shm_nodelist);
        [...]
        while ((node = NSC_NODELIST_GET_NEXT(&cookie, nl))                
                                      
                                                != CLUSTERNODE_INVAL) {
               if (node == this_node)
                        continue;
                ret = RIPC_DROP_LOCKS(node, &rval, id);
        }

Which will call ipc_drop_locks on the remote node, but who has the lock? 
And why?

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=405834&aid=1944781&group_id=32541

[SSI-devel] [ ssic-linux-Bugs-1944781 ] unlocking unlocked lock in ripc_drop_locks

From: SourceForge.net <no...@so...> - 2010-03-13 20:00:16

Bugs item #1944781, was opened at 2008-04-17 06:23
Message generated for change (Settings changed) made by rogertsang
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=405834&aid=1944781&group_id=32541

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: IPC
Group: v1.9.3
>Status: Closed
Resolution: Fixed
Priority: 5
Private: No
Submitted By: John Hughes (hughesj)
Assigned to: Nobody/Anonymous (nobody)
Summary: unlocking unlocked lock in ripc_drop_locks

Initial Comment:
------------[ cut here ]------------
kernel BUG at include/asm/spinlock.h:112!
invalid operand: 0000 [#1]
SMP
Modules linked in: i915 drm button ac battery parport_pc parport floppy pcspkr snd_intel8x0 snd_ac97_codec snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore snd_page_alloc i2c_i801 i2c_core ata_piix libata hw_random ehci_hcd uhci_hcd sr_mod sd_mod mptsas mptscsih mptbase scsi_mod tg3 e1000
CPU:    0
EIP:    0060:[<c046290b>]    Not tainted VLI
EFLAGS: 00010202   (2.6.11-jh-1)
EIP is at _spin_unlock+0x1b/0x30
eax: 00000001   ebx: c0750140   ecx: c0750101   edx: f7e12e08
esi: f70c2400   edi: c0753360   ebp: f7032f10   esp: f7032f10
ds: 007b   es: 007b   ss: 0068
Process icssvr_daemon (pid: 197135, threadinfo=f7032000 task=f70cd930)
Stack: f7032f18 c01cecbb f7032f28 c01ce77e f7e12e08 02668001 f7032f44 c0261dd5
       02668001 f7e12e08 c0750140 00000001 f7032f5c f7032f6c c0258708 00000003
       f7032f5c 02668001 00000000 00000000 02668001 00000002 00000002 f7032fec
Call Trace:
 [<c010694f>] show_stack+0x7f/0xa0
 [<c0106b04>] show_registers+0x164/0x220
 [<c0106e94>] die+0xf4/0x1c0
 [<c0107015>] do_trap+0xb5/0xc0
 [<c01072cc>] do_invalid_op+0xbc/0xd0
 [<c01065a3>] error_code+0x2b/0x30
 [<c01cecbb>] ipc_unlock+0xb/0x10
 [<c01ce77e>] ipc_drop_locks+0x1e/0x40
 [<c0261dd5>] ripc_drop_locks+0x45/0x60
 [<c0258708>] svr_ripc_drop_locks+0x58/0xb0
 [<c020abb3>] icssvr_daemon+0x2f3/0xab0
 [<c01023a5>] kernel_thread_helper+0x5/0x10
Code: 1c 0c 49 c0 eb e6 8d 76 00 8d bc 27 00 00 00 00 55 89 c2 89 e5 81 78 04 ad 4e ad de b1 01 75 15 0f b6 02 84 c0 7f 04 86 0a 5d c3 <0f> 0b 70 00 1c 0c 49 c0 eb f2 0f 0b 6f 00 1c 0c 49 c0 eb e1 90


----------------------------------------------------------------------

Comment By: John Hughes (hughesj)
Date: 2008-10-20 08:22

Message:
Fixed in latest CVS

----------------------------------------------------------------------

Comment By: John Hughes (hughesj)
Date: 2008-06-08 04:51

Message:
Logged In: YES 
user_id=166336
Originator: YES

Using a kernel build from CVS as of 2008-06-07 I see the same problem -
crashes in ipc_unlock called from ripc_drop locks.

Here's my analysis of the problem:

1. There is a rpc "RIPC_DROP_LOCKS(node, rval, id)" which calls
ripc_drop_locks on another node.

2. ripc_drop_locks, which is running in the context of some rpc server on
some other node than the process that called RIPC_DROP_LOCKS does:

        struct shmid_kernel *shp = shm_cli_get(id);
        if (shp)
                shp->shm_perm.mode &= ~SHM_LOCK_DEST;
        ipc_drop_locks(id, (struct kern_ipc_perm *)shp, &shm_ids, 1);

(id is obviously nonzero here or shm_cli_get would do nothing
interesting)

ipc_drop_locks does:

        if (id > 0)
                ipc_unlock(perm);
        if (table)
                up(&ids->sem);

(id is nonzero so we must call ipc_unlock)

and ipc_unlock does:

        spin_unlock(&perm->lock);
        rcu_read_unlock();

So ipc_unlock expects that the calling process will have the spinlock, but
ripc_unlock_locks *CANNOT* have the spinlock - it is operating in the
context of the rpc server, and spinlocks can't be held across sleeps, and
the rpc server obviously sleeps between rpc's.

My hacky fix is to set id to zero in the call to ipc_drop_locks.  An
equivalent fix would to be to replace the call to ipc_drop_locks by
directly calling  up(&ids->sem);


----------------------------------------------------------------------

Comment By: Roger Tsang (rogertsang)
Date: 2008-05-23 20:21

Message:
Logged In: YES 
user_id=1246761
Originator: NO

Latest patch in CVS.

----------------------------------------------------------------------

Comment By: Roger Tsang (rogertsang)
Date: 2008-04-23 03:16

Message:
Logged In: YES 
user_id=1246761
Originator: NO

Taking a closer look at shm_destroy() and looks like ssi_shm_cleanup() did
not handle -ERMID.  Kinda late night, so haven't tested either your
shmcrash or this patch but it makes sense.
File Added: IPC_SHM_DESTROY_FIX.patch

----------------------------------------------------------------------

Comment By: John Hughes (hughesj)
Date: 2008-04-22 08:12

Message:
Logged In: YES 
user_id=166336
Originator: YES

I've just tested with current CVS (2008-4-22) which, as far as I can tell
includes the ipc_shm.c.patch, IPC_SHM_DESTROY_FIX is defined in linux.h and
I get the same crash.

To duplicate the problem, on a two node system:

onnode 1 ./shmcrash 123 100 & 
sleep 1
onnode 2 ./shmcrash 123 120 

Node 2 will crash in ipc_unlock when the shmcrash process on node 1
destroys the shared memory segment.

Source for shmcrash program attached.

File Added: shmcrash.c

----------------------------------------------------------------------

Comment By: Roger Tsang (rogertsang)
Date: 2008-04-19 15:25

Message:
Logged In: YES 
user_id=1246761
Originator: NO

Updated patch for sys_shmget() client did not register with server.
File Added: ipc_shm.c.patch

----------------------------------------------------------------------

Comment By: Roger Tsang (rogertsang)
Date: 2008-04-19 03:13

Message:
Logged In: YES 
user_id=1246761
Originator: NO

Race at shm_destroy() going to the wrong svrnode. Try this patch.
File Added: ipc_shm.c.patch

----------------------------------------------------------------------

Comment By: John Hughes (hughesj)
Date: 2008-04-18 12:27

Message:
Logged In: YES 
user_id=166336
Originator: YES

Well, a simple patch seems to be set the "id" to zero in the call to
ipc_drop_locks in ripc_drop_locks.  That means we continue to call
up(&ids->sem) (which we need) but not ipc_unlock (which crashes us
immediately).
Patch attached,
File Added: ripc_drop_locks.patch

----------------------------------------------------------------------

Comment By: John Hughes (hughesj)
Date: 2008-04-17 07:18

Message:
Logged In: YES 
user_id=166336
Originator: YES

Well,turned out to be pretty easy to duplicate.  Wrote a silly test
program to create, attach and destroy a segment.

Node 1              node 2

shmget
smhmat
                    shmget
                    shmat
shmctl (IPC_RMID)

                    BUG!



----------------------------------------------------------------------

Comment By: John Hughes (hughesj)
Date: 2008-04-17 07:00

Message:
Logged In: YES 
user_id=166336
Originator: YES

Frankly I can't see how this is supposed to work.  In abort_rmid
(cluster/ssi/ipc/ipcshm_svr.c around line 418) we have:

        nl = NSC_NODELIST_COPY(svp->shm_nodelist);
        [...]
        while ((node = NSC_NODELIST_GET_NEXT(&cookie, nl))                
                                      
                                                != CLUSTERNODE_INVAL) {
               if (node == this_node)
                        continue;
                ret = RIPC_DROP_LOCKS(node, &rval, id);
        }

Which will call ipc_drop_locks on the remote node, but who has the lock? 
And why?

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=405834&aid=1944781&group_id=32541