Maybe instead of only doing SSI_ASSERT() in rmtunix.c::rmtunix_dgram_sendmsg() we could also return -EREMOTE, so that af_unix.c::unix_dgram_sendmsg() would try again - with limited retries.

I rebooted node 2 and it's running fine.  I'm starting to consider the possibility of faulty hardware.

Roger


On 10/28/05, Roger Tsang <roger.tsang@gmail.com> wrote:
Brian,

Interesting.  After recompiling the kernel with RWSEM_DEBUG, I get this panic while booting node2 (non-initnode).

-Roger

Setting hard drive parameters for hda:  [  OK  ]
Setting hard drive parameters for hdc:  [  OK  ]
Setting hard drive parameters for hde:  [  OK  ]
Applying iptables firewall rules: Kernel panic - not syncing: icscli_handle_get: node sending message to itself!
 
Entering kdb (current=0xdec2d040, pid 132335) due to KDB_ENTER()
kdb> bt
Stack traceback for pid 132335
0xdec2d040   132335   132331  1    0   R  0xdec2d200 *10-udev.hotplug
EBP        EIP        Function (args)
0xded55b88 0xc0291588 kdb_panic+0x28 (0xc051d390, 0x0, 0xc0614920, 0xded54000, 0xdffd6360)
0xded55ba8 0xc01290a5 notifier_call_chain+0x25 (0xc0614900, 0x0, 0xc0614920, 0xded55bd0, 0xded54000)
0xded55bc4 0xc011b5fc panic+0x8c (0xc03ee824, 0x37000400, 0x4917b, 0xffffc000, 0xded54000)
0xded55be8 0xc01f3b4e icscli_handle_get+0x1fe (0x2, 0x80003, 0x0, 0xc011c206, 0xa)
0xded55c54 0xc02457b5 cli_rmtunixsvr_dgram_sendmsg+0x35 (0x2, 0xded55cac, 0xded55cb0, 0x2000640, 0x0)
0xded55d0c 0xc02472df rmtunix_dgram_sendmsg+0x17f (0xded55dcc, 0x2, 0x2000640, 0xdf5e9cc0, 0xded55ec8)
0xded55db4 0xc03a4bbe unix_dgram_sendmsg+0x24e (0xded55e1c, 0xdf5e9cc0, 0xded55ec8, 0x16c, 0x204ef)
0xded55ea8 0xc033f031 sock_sendmsg+0xc1 (0xdf5e9cc0, 0xded55ec8, 0x16c, 0xbffffc1c, 0x16c)
0xded55f74 0xc0340628 sys_sendto+0xe8 (0x0, 0xbffffc1c, 0x16c, 0x0, 0xbffffbac)
0xded55fbc 0xc034114f sys_socketcall+0x1bf
           0xc0103c85 sysenter_past_esp+0x52
kdb>