[SSI-devel] [ ssic-linux-Bugs-924120 ] ethtool crashes system
Brought to you by:
brucewalker,
rogertsang
From: SourceForge.net <no...@so...> - 2004-07-07 20:53:35
|
Bugs item #924120, was opened at 2004-03-26 11:07 Message generated for change (Comment added) made by bjbrew You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=405834&aid=924120&group_id=32541 Category: Networking Group: None Status: Open Resolution: None Priority: 5 Submitted By: Aneesh Kumar K.V (kvaneesh) Assigned to: Keerthi Bhushan (keerthi) Summary: ethtool crashes system Initial Comment: Laura, I've used ethtool successfully before on the cluster. But, this time, I ran ethtool (on the external NIC of the initnode) and it caused the system to lock. I got the following on the console of the second nodes: Instruction(i) breakpoint #0 at 0xc0124840 (adjusted) 0xc0124840 panic_hook: int3 Entering kdb (current=0xf71cc000, pid 131150) on processor 0 due to Breakpoint @ 0xc0124840 kdb> bt Stacktrace for pid 131150 0xf71cc000 131150 2 1 0 R 0xf71cc420 *nm cli nd daemo EBP EIP Function(args) 0xf71cdf4c 0xc0124840 panic_hook (0xc038b298, 0xc05bbae0, 0xf71cdf58, 0x0) kernel .text 0xc0100000 0xc0124840 0xc0124850 0xc0124897 panic+0x47 (0xc03a2840, 0x2e323931, 0x2e383631, 0x312e30, 0x0 kernel .text 0xc0100000 0xc0124850 0xc0124990 0xf71cdfd4 0xc026911e clms_attempt_failover+0x15e (0x1, 0x0) kernel .text 0xc0100000 0xc0268fc0 0xc0269260 0xf71cdfec 0xc02902d5 nm_client_nodedown_daemon+0x55 (0x0) kernel .text 0xc0100000 0xc0290280 0xc02902f0 0xc0290280 nm_clietn_nodedown_daemon kernel .text 0xc0100000 0xc0290280 0xc02902f0 0xc0107889 kernel_thread_helper+0x5 kernel .text 0xc0100000 0xc0107884 0xc0107890 -- Jiann-Ming Su js...@em... 404-712-2603 Development Team Systems Administrator General Libraries Systems Division ---------------------------------------------------------------------- >Comment By: Brian J. Watson (bjbrew) Date: 2004-07-07 13:53 Message: Logged In: YES user_id=16302 Another option is to make the ICS interface resistant to any meddling from user-mode. ---------------------------------------------------------------------- Comment By: Keerthi Bhushan (keerthi) Date: 2004-07-07 05:27 Message: Logged In: YES user_id=825227 ethtool -t eth0 on a two node RHEL based SSI cluster caused the secondary node to try to failover and since I had no failover ability it crashed. ethtool -t eth0 by default does offline selftest of the eth0 device. I noticed that "ethtool -t eth0 online" doesn't crash the secondary node. The momentary loss of connection with init node is the reason. I have the following options: 1. make ethtool crash the secondary node more gracefully (but how I don't know) 2. make online the default option on SSI and warn when user provides the offline option that the secondary nodes might try to failover (and crash potentially). ---------------------------------------------------------------------- Comment By: Bruce J. Walker (brucewalker) Date: 2004-06-17 18:25 Message: Logged In: YES user_id=296932 I tried a simple ethertool eth0 on a RH9 SSI rc5 system (init node and non-initnode). Also did ethertool eth1. Had no problems. See if you can reproduce and try to provide the bt on the node that panics. ---------------------------------------------------------------------- Comment By: Bruce J. Walker (brucewalker) Date: 2004-06-17 18:23 Message: Logged In: YES user_id=296932 I tried a simple ethertool eth0 on a RH9 SSI rc5 system (init node and non-initnode). Also did ethertool eth1. Had no problems. See if you can reproduce and try to provide the bt on the node that panics. ---------------------------------------------------------------------- Comment By: Bruce J. Walker (brucewalker) Date: 2004-06-17 18:20 Message: Logged In: YES user_id=296932 I tried a simple ethertool eth0 on a RH9 SSI rc5 system (init node and non-initnode). Also did ethertool eth1. Had no problems. See if you can reproduce and try to provide the bt on the node that panics. ---------------------------------------------------------------------- Comment By: Brian J. Watson (bjbrew) Date: 2004-04-05 16:12 Message: Logged In: YES user_id=16302 Kishore, Duplicate this bug and find out what's happening on the init node. Then we'll have a better idea of what needs to be fixed. Thanks, Brian ---------------------------------------------------------------------- Comment By: Brian J. Watson (bjbrew) Date: 2004-03-26 14:16 Message: Logged In: YES user_id=16302 The backtrace merely means that the second node lost its root node. What happened on the root node is much more interesting. Brian ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=405834&aid=924120&group_id=32541 |