Thread: [SSI-devel] SSI-1.9 Kernel panic - not syncing: icscli_handle_get: node sending message to itself!

Brought to you by: brucewalker, rogertsang

ssic-linux-devel

[SSI-devel] SSI-1.9 Kernel panic - not syncing: icscli_handle_get: node sending message to itself!

From: Roger T. <rog...@gm...> - 2005-10-28 17:28:30

Brian,

Interesting. After recompiling the kernel with RWSEM_DEBUG, I get this pani=
c
while booting node2 (non-initnode).

-Roger

Setting hard drive parameters for hda: [ OK ]
Setting hard drive parameters for hdc: [ OK ]
Setting hard drive parameters for hde: [ OK ]
Applying iptables firewall rules: Kernel panic - not syncing:
icscli_handle_get: node sending message to itself!

Entering kdb (current=3D0xdec2d040, pid 132335) due to KDB_ENTER()
kdb> bt
Stack traceback for pid 132335
0xdec2d040 132335 132331 1 0 R 0xdec2d200 *10-udev.hotplug
EBP EIP Function (args)
0xded55b88 0xc0291588 kdb_panic+0x28 (0xc051d390, 0x0, 0xc0614920,
0xded54000, 0xdffd6360)
0xded55ba8 0xc01290a5 notifier_call_chain+0x25 (0xc0614900, 0x0, 0xc0614920=
,
0xded55bd0, 0xded54000)
0xded55bc4 0xc011b5fc panic+0x8c (0xc03ee824, 0x37000400, 0x4917b,
0xffffc000, 0xded54000)
0xded55be8 0xc01f3b4e icscli_handle_get+0x1fe (0x2, 0x80003, 0x0,
0xc011c206, 0xa)
0xded55c54 0xc02457b5 cli_rmtunixsvr_dgram_sendmsg+0x35 (0x2, 0xded55cac,
0xded55cb0, 0x2000640, 0x0)
0xded55d0c 0xc02472df rmtunix_dgram_sendmsg+0x17f (0xded55dcc, 0x2,
0x2000640, 0xdf5e9cc0, 0xded55ec8)
0xded55db4 0xc03a4bbe unix_dgram_sendmsg+0x24e (0xded55e1c, 0xdf5e9cc0,
0xded55ec8, 0x16c, 0x204ef)
0xded55ea8 0xc033f031 sock_sendmsg+0xc1 (0xdf5e9cc0, 0xded55ec8, 0x16c,
0xbffffc1c, 0x16c)
0xded55f74 0xc0340628 sys_sendto+0xe8 (0x0, 0xbffffc1c, 0x16c, 0x0,
0xbffffbac)
0xded55fbc 0xc034114f sys_socketcall+0x1bf
0xc0103c85 sysenter_past_esp+0x52
kdb>

Re: [SSI-devel] Re: SSI-1.9 Kernel panic - not syncing: icscli_handle_get: node sending message to itself!

From: Brian J. W. <Bri...@hp...> - 2005-10-28 19:19:40

I'm not sure that retries would help. Since there's no socket migration, 
nothing's going to change regarding where rmtunix_dgram_sendmsg() is 
told to send the RPC.

There is clearly a bug here. If the sending and receiving sockets are on 
the same node, then the peer pointer of the sending "struct sock" should 
be set equal to the receiving "struct sock". In this case, it's set to a 
rmtunix_socket_info structure, which is only valid if the node field is 
equal to some other node.

Brian


Roger Tsang wrote:
> Maybe instead of only doing SSI_ASSERT() in 
> rmtunix.c::rmtunix_dgram_sendmsg() we could also return -EREMOTE, so 
> that af_unix.c::unix_dgram_sendmsg() would try again - with limited retries.
> 
> I rebooted node 2 and it's running fine.  I'm starting to consider the 
> possibility of faulty hardware.
> 
> Roger
> 
> 
> On 10/28/05, *Roger Tsang* <rog...@gm... 
> <mailto:rog...@gm...>> wrote:
> 
>     Brian,
> 
>     Interesting.  After recompiling the kernel with RWSEM_DEBUG, I get
>     this panic while booting node2 (non-initnode).
> 
>     -Roger
> 
>     Setting hard drive parameters for hda:  [  OK  ]
>     Setting hard drive parameters for hdc:  [  OK  ]
>     Setting hard drive parameters for hde:  [  OK  ]
>     Applying iptables firewall rules: Kernel panic - not syncing:
>     icscli_handle_get: node sending message to itself!
>      
>     Entering kdb (current=0xdec2d040, pid 132335) due to KDB_ENTER()
>     kdb> bt
>     Stack traceback for pid 132335
>     0xdec2d040   132335   132331  1    0   R  0xdec2d200 *10-udev.hotplug
>     EBP        EIP        Function (args)
>     0xded55b88 0xc0291588 kdb_panic+0x28 (0xc051d390, 0x0, 0xc0614920,
>     0xded54000, 0xdffd6360)
>     0xded55ba8 0xc01290a5 notifier_call_chain+0x25 (0xc0614900, 0x0,
>     0xc0614920, 0xded55bd0, 0xded54000)
>     0xded55bc4 0xc011b5fc panic+0x8c (0xc03ee824, 0x37000400, 0x4917b,
>     0xffffc000, 0xded54000)
>     0xded55be8 0xc01f3b4e icscli_handle_get+0x1fe (0x2, 0x80003, 0x0,
>     0xc011c206, 0xa)
>     0xded55c54 0xc02457b5 cli_rmtunixsvr_dgram_sendmsg+0x35 (0x2,
>     0xded55cac, 0xded55cb0, 0x2000640, 0x0)
>     0xded55d0c 0xc02472df rmtunix_dgram_sendmsg+0x17f (0xded55dcc, 0x2,
>     0x2000640, 0xdf5e9cc0, 0xded55ec8)
>     0xded55db4 0xc03a4bbe unix_dgram_sendmsg+0x24e (0xded55e1c,
>     0xdf5e9cc0, 0xded55ec8, 0x16c, 0x204ef)
>     0xded55ea8 0xc033f031 sock_sendmsg+0xc1 (0xdf5e9cc0, 0xded55ec8,
>     0x16c, 0xbffffc1c, 0x16c)
>     0xded55f74 0xc0340628 sys_sendto+0xe8 (0x0, 0xbffffc1c, 0x16c, 0x0,
>     0xbffffbac)
>     0xded55fbc 0xc034114f sys_socketcall+0x1bf
>                0xc0103c85 sysenter_past_esp+0x52
>     kdb>
> 
>

Re: [SSI-devel] Re: SSI-1.9 Kernel panic - not syncing: icscli_handle_get: node sending message to itself!

From: Roger T. <rog...@gm...> - 2005-11-05 11:22:20

I ran into this oops again and with RWSEM_DEBUG off.

/dev/hdc7: recovering journal
/dev/hdc7: clean, 11545/6012928 files, 9600131/12006571 blocks
[ OK ]
Mounting local filesystems: [ OK ]
rm: cannot remove `/var/run/dovecot/login': Is a directory
Kernel panic - not syncing: icscli_handle_get: node sending message to
itself!

Entering kdb (current=3D0xdffdaaa0, pid 132277) due to KDB_ENTER()
kdb> bt
Stack traceback for pid 132277
0xdffdaaa0 132277 131883 1 0 R 0xdffdac60 *10-udev.hotplug
EBP EIP Function (args)
0xde787b88 0xc028d607 kdb_panic+0x27 (0xc0518e50, 0x0, 0xc0608960,
0xc0608960, 0x0)
0xde787ba8 0xc0128b85 notifier_call_chain+0x25 (0xc0608940, 0x0, 0xc0608960=
,
0xde787bd0, 0xde786000)
0xde787bc4 0xc011b04c panic+0x8c (0xc03ed06c, 0x37000400, 0x2aef,
0xffffd511, 0xde786000)
0xde787be8 0xc01ef3f4 icscli_handle_get+0x204 (0x2, 0x80003, 0x0,
0xc011bcb3, 0xa)
0xde787c54 0xc0241173 cli_rmtunixsvr_dgram_sendmsg+0x33 (0x2, 0xde787cac,
0xde787cb0, 0x200059e, 0x0)
0xde787d0c 0xc0242d28 rmtunix_dgram_sendmsg+0x188 (0xde787dcc, 0x2,
0x200059e, 0xddc8ea00, 0xde787ec8)
0xde787db4 0xc03a3abe unix_dgram_sendmsg+0x24e (0xde787e1c, 0xddc8ea00,
0xde787ec8, 0x16c, 0x2)
0xde787ea8 0xc033da3f sock_sendmsg+0xbf (0xddc8ea00, 0xde787ec8, 0x16c,
0xbffffc3c, 0x16c)
0xde787f74 0xc033f031 sys_sendto+0xe1 (0x0, 0xbffffc3c, 0x16c, 0x0,
0xbffffbcc)
0xde787fbc 0xc033fb5f sys_socketcall+0x1bf
0xc0103c55 sysenter_past_esp+0x52
kdb>


On 10/28/05, Brian J. Watson <Bri...@hp...> wrote:
>
> I'm not sure that retries would help. Since there's no socket migration,
> nothing's going to change regarding where rmtunix_dgram_sendmsg() is
> told to send the RPC.
>
> There is clearly a bug here. If the sending and receiving sockets are on
> the same node, then the peer pointer of the sending "struct sock" should
> be set equal to the receiving "struct sock". In this case, it's set to a
> rmtunix_socket_info structure, which is only valid if the node field is
> equal to some other node.
>
> Brian
>
>
> Roger Tsang wrote:
> > Maybe instead of only doing SSI_ASSERT() in
> > rmtunix.c::rmtunix_dgram_sendmsg() we could also return -EREMOTE, so
> > that af_unix.c::unix_dgram_sendmsg() would try again - with limited
> retries.
> >
> > I rebooted node 2 and it's running fine. I'm starting to consider the
> > possibility of faulty hardware.
> >
> > Roger
> >
> >
> > On 10/28/05, *Roger Tsang* <rog...@gm...
> > <mailto:rog...@gm...>> wrote:
> >
> > Brian,
> >
> > Interesting. After recompiling the kernel with RWSEM_DEBUG, I get
> > this panic while booting node2 (non-initnode).
> >
> > -Roger
> >
> > Setting hard drive parameters for hda: [ OK ]
> > Setting hard drive parameters for hdc: [ OK ]
> > Setting hard drive parameters for hde: [ OK ]
> > Applying iptables firewall rules: Kernel panic - not syncing:
> > icscli_handle_get: node sending message to itself!
> >
> > Entering kdb (current=3D0xdec2d040, pid 132335) due to KDB_ENTER()
> > kdb> bt
> > Stack traceback for pid 132335
> > 0xdec2d040 132335 132331 1 0 R 0xdec2d200 *10-udev.hotplug
> > EBP EIP Function (args)
> > 0xded55b88 0xc0291588 kdb_panic+0x28 (0xc051d390, 0x0, 0xc0614920,
> > 0xded54000, 0xdffd6360)
> > 0xded55ba8 0xc01290a5 notifier_call_chain+0x25 (0xc0614900, 0x0,
> > 0xc0614920, 0xded55bd0, 0xded54000)
> > 0xded55bc4 0xc011b5fc panic+0x8c (0xc03ee824, 0x37000400, 0x4917b,
> > 0xffffc000, 0xded54000)
> > 0xded55be8 0xc01f3b4e icscli_handle_get+0x1fe (0x2, 0x80003, 0x0,
> > 0xc011c206, 0xa)
> > 0xded55c54 0xc02457b5 cli_rmtunixsvr_dgram_sendmsg+0x35 (0x2,
> > 0xded55cac, 0xded55cb0, 0x2000640, 0x0)
> > 0xded55d0c 0xc02472df rmtunix_dgram_sendmsg+0x17f (0xded55dcc, 0x2,
> > 0x2000640, 0xdf5e9cc0, 0xded55ec8)
> > 0xded55db4 0xc03a4bbe unix_dgram_sendmsg+0x24e (0xded55e1c,
> > 0xdf5e9cc0, 0xded55ec8, 0x16c, 0x204ef)
> > 0xded55ea8 0xc033f031 sock_sendmsg+0xc1 (0xdf5e9cc0, 0xded55ec8,
> > 0x16c, 0xbffffc1c, 0x16c)
> > 0xded55f74 0xc0340628 sys_sendto+0xe8 (0x0, 0xbffffc1c, 0x16c, 0x0,
> > 0xbffffbac)
> > 0xded55fbc 0xc034114f sys_socketcall+0x1bf
> > 0xc0103c85 sysenter_past_esp+0x52
> > kdb>
> >
> >
>
>