Share

OpenSSI Clusters for Linux

Tracker: Bugs

5 stuck while tracing ssi-ksync - ID: 2719752
Last Update: Settings changed ( rogertsang )

From John Hughes (March 16):

# strace -f -o zz /sbin/ssi-ksync
...
pvpop_rmv_child_from_parent: waiting for parent lock 68272
pvpop_rmv_child_from_parent: waiting for parent lock 68272
pvpop_rmv_child_from_parent: waiting for parent lock 68272
pvpop_rmv_child_from_parent: waiting for parent lock 68272
...


Roger Tsang ( rogertsang ) - 2009-03-28 22:12

5

Open

Fixed

Roger Tsang

Process Management

v2.0.0pre2

Public


Comments ( 3 )




Date: 2009-10-27 03:48
Sender: rogertsangProject Admin

checked-in


Date: 2009-07-04 20:54
Sender: rogertsangProject Admin

This pvpop_rmv_child_from_parent() was remotely called from
__ptrace_unlink() path and is stuck waiting for lock held by strace process
which is waiting for remote child to get reaped.


Date: 2009-03-28 22:16
Sender: rogertsangProject Admin

From John Hughes (March 16):

Further investigation shows that the problem is happening in
mkdhcpd.conf:

# strace -f -o zz /sbin/mkdhcpd.conf
pvpop_rmv_child_from_parent: waiting for parent lock xxxxx


If I power-off the node that's waiting for the parent lock then the other
node crashes:

Taking over master from node 2.
Node 2 has gone down!!!
passed the first scan in ipcname_pull_data
num_objects[MSG] = 0
num_objects[SEM] = 0
num_objects[SHM] = 0
ipcnameserver ready completed
write handler down off 2400000 len 4096

EXT3-fs: INFO: recovery required on readonly filesystem.
EXT3-fs: write access will be enabled during recovery.
kjournald starting. Commit interval 5 seconds
EXT3-fs: recovery complete.
EXT3-fs: mounted filesystem with ordered data mode.
fsck 1.37 (21-Mar-2005)
EXT3 FS on hdb1, internal journal
/etc/init.d/rc.sysrecover running
ptrace_unlink: vpop_reclaim failed Unable to handle kernel NULL pointer
dereference at virtual address 00000008
printing eip:
c0241911
*pde = 00000000
Oops: 0000 [#1]
SMP Modules linked in: parport_pc parport floppy uhci_hcd ohci_hcd
ehci_hcd i2c_piix4 i2c_core ide_scsi scsi_mod ext3 jbd ne2k_pci 8390
CPU: 0
EIP: 0060:[<c0241911>] Not tainted VLI
EFLAGS: 00000282 (2.6.11-ssi-686-smp) EIP is at
pvpop_reassign_child+0x281/0x920
eax: 00000000 ebx: 0002074b ecx: 00000000 edx: cea561f0
esi: 0002074c edi: cf6ae000 ebp: cf6afebc esp: cf6afe1c
ds: 007b es: 007b ss: 0068
Process VPROC Slave Dae (pid: 67247, threadinfo=cf6ae000 task=cf980b10)
Stack: 0002074b 00000001 c04cffc1 00000001 cf6afe4c c01540a2 00000014
00000000 00000000 cf532040 cf532000 00000000 cea561f0 c0155036
cf6afe78 c025c57f 00000001 000008d0 cf9ef700 00000010 cea56cc0
00000001 00000001 cf6afe94 Call Trace:
[<c010679f>] show_stack+0x7f/0xa0
[<c0106944>] show_registers+0x164/0x230
[<c0106cf4>] die+0xf4/0x1c0
[<c011f39d>] do_page_fault+0x46d/0x669
[<c0106403>] error_code+0x2b/0x30
[<c025965a>] vproc_child_lost_parent+0x7a/0x160
[<c0259ead>] vproc_lost_relations+0xad/0x218
[<c024d554>] pvpsop_lost_relations+0x94/0xa0
[<c025956e>] vproc_carelist_cleanup+0x7e/0xf0
[<c0256933>] vproc_origin_node_cleanup+0x63/0x2b0
[<c0256dde>] vproc_origin_cleanup+0xbe/0x190
[<c0257fc1>] vproc_slave_daemon+0x1a1/0x260
[<c01023a5>] kernel_thread_helper+0x5/0x10
Code: 8b 4d 8c 89 0c 24 e8 6f a9 01 00 ba 01 00 00 00 89 54 24 04 8b 55 90
8b 42 2c 89 04 24 e8 88 a6 01 00 89 45 8c 8b 4d 8c 8b 55 90 <8b> 41 08 39
42 28 0f 84 12 03 00 00 8b 85 7c ff ff ff 85 c0 75


Log in to comment.

Attached File

No Files Currently Attached

Changes ( 4 )

Field Old Value Date By
resolution_id Accepted 2009-10-27 03:48 rogertsang
assigned_to nobody 2009-07-04 20:54 rogertsang
artifact_group_id None 2009-07-04 20:54 rogertsang
resolution_id None 2009-07-04 20:54 rogertsang