RE: [SSI-users] Cluster strange boot behaviour
Brought to you by:
brucewalker,
rogertsang
From: Walker, B. J <bru...@hp...> - 2005-02-17 18:42:23
|
The node monitoring is done over UDP so dropping tcp 900 etc. just hangs = all other OpenSSI traffic, which will turn into hung processes = (particuarly things like ps, that need to talk to all nodes). I would = expect that if you powered off node 3, the rest of the cluster would be = just fine. We probably should monitor the OpenSSI connections and fail = more gracefully when one of the connections dies (that could be quite = awhile with tcp timeouts), but otherwise I'm not sure there is a bug = demonstrated here. High load on the interconnect or on any node could slow down other nodes = but as long as the node is not kicked out of the cluster, things should = continue. If, in normal operations (without special iptables calls), = you get hangs, we may have something to debug. Netdump is the preferred = way to do that but working thru kdb is possible as well. Having a test = case that causes the hang (so we can reproduce and debug) is of course = the best way to resolve. Bruce >=20 > 2/ > iptables -I INPUT 1 -p tcp -m multiport --ports 900,902 -j DROP > This drops all Open-SSI related links from node3 to the init one > (node1). =20 > node3 is still reachable via the ping command from node1. After a few > seconds node1 hangs as described previously. Not any new=20 > process can be > launched. > Reboot all the cluster. >=20 >=20 > I suppose these ports (900 and 902) are used to communicate=20 > between each > node (I didn't see anything in the docs, I didn't read all=20 > mailling lists) > But I do not belive this could be a strange behaviour. This may be a > bug? >=20 > I think when init node runs high load, networks should lag then damage > all slave nodes. I'm not sure but this is what I saw. >=20 > Note I use the stable openssi release (1.2.0) on a debian system (just > read the docs). > Mybe this had been fixed in the development version >=20 > Next week I should test with 2 init nodes using LVS. >=20 > Regards >=20 > --=20 > Sebastien J. Gross >=20 >=20 > ------------------------------------------------------- > SF email is sponsored by - The IT Product Guide > Read honest & candid reviews on hundreds of IT Products from=20 > real users. > Discover which products truly live up to the hype. Start reading now. > http://ads.osdn.com/?ad_id=3D6595&alloc_id=3D14396&op=3Dclick > _______________________________________________ > Ssic-linux-users mailing list > Ssi...@li... > https://lists.sourceforge.net/lists/listinfo/ssic-linux-users >=20 |