From: <no...@so...> - 2002-12-09 17:49:02
|
Bugs item #639482, was opened at 2002-11-16 17:57 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=434892&aid=639482&group_id=43021 Category: None Group: None Status: Open Resolution: None Priority: 5 Submitted By: Jeff Squyres (jsquyres) Assigned to: Nobody/Anonymous (nobody) Summary: ganglia reports live nodes are dead Initial Comment: Setup: - Tested on RH 7.2 and 7.3 - Installations in vmware with OSCAR 2.0 (CVS, to be released shortly...) with Steve Duchene's packaging of ganglia 2.5.0 - One "head" node and 2 "cluster" nodes - Ganglia web page is served off the head node After an initial install / boot of the cluster nodes, all three nodes appear on the ganglia web page as healthy. A short time later, the two cluster nodes are marked as down. This is a repeatable problem. Restarting the gmond on the head node ("service gmond restart") causes the two cluster nodes to disappear from the ganglia web page. Restarting the gmond on the two cluster nodes makes them appear up and healthy on the ganglia web page, and they don't seem to disappear, even after [relatively] long periods of uptime. Specifically, heartbeats appear to come in at regular intervals (according to the ganliga web page). When a cluster node is marked as down, rebooting the cluster nodes makes it be marked up briefly, and then eventually dead again (i.e., no heartbeat for over 60 seconds). All of these cases definitely exhibit a few heartbeats in the beginning (I don't know how many, but it's at least 1), but then the heartbeats inexplicably stop, and therefore the nodes get marked down. We have randomly seen this problem on real machine installs as well (vs. vmware installs). I don't have any firm data on that, though, although the reports came from credible testing personnel. I don't know if this is a vmware-ism, or an OSCAR-ism, or something else. This problem has plagued OSCAR for quite a while now, and any insight that you guys could provide would be most helpful. One note about vmware installs: the time on the cluster nodes is way off (compared to the head node), and there doesn't seem to be a way to fix it. The head node has the correct time (set by ntp to an external server), but the two nodes are never right, and resist all attempts to change there time (there appears to be something in vmware that repeatedly and aggressively sets the time to something way off from reality). I am using vmware workstation version 3.1. I'd be happy to offer temporary access to machines (i.e., vmware instances) if any ganglia developers would benefit from poking around to see what's going wrong. ---------------------------------------------------------------------- >Comment By: Jeff Squyres (jsquyres) Date: 2002-12-09 12:49 Message: Logged In: YES user_id=11722 Same setup: 3 vmware nodes, RH 7.3, one "head" node, two "client" nodes. When all is well (i.e., nodes are marked as up, etc.), "tcpdump ether multicast" reports lots of multicast activity from all three nodes. However, sometimes the multicast activity mysteriously stops from the client nodes. That is, there was activity for a while, and then it just stops. I say this to clarify previous comments -- there wasn't just "one" multicast ping, there seemed to be a bunch, and then it just stopped. I ran "tcpdump ether multicast" on the head node the whole time and observered multicast activity from the head node during the entire timeframe. When the multicast activity from the client nodes stopped, I logged in and ran "tcpdump ether multicast" on each node. I was working on the assumption that the client nodes would still be sending multicast data, but because of switching issues, somehow would not "see" the head node. Unfortunately, this was not the case; the client nodes could still see all the multicast activity from the head node -- they just weren't generating any. I should note that the client nodes were sending/receiving ARPs during this time -- so it looks like their multicast capability and their interaction with the switch is still ok. ps on the client nodes confirmed that the gmond is still running. I have attached the output (from the head node) of "tcpdump ether multicast" showing activity from the head node (queegvm.oscar.vmware), followed by the boot up activity of oscarnode2.oscar.vmware, several multicast ganglia "pings" from oscarnode2, and then it eventually stops. Note that after oscarnode2's ganglia multicast activity stops, there's still one more ARP from oscarnode2. So it seems as if oscarnode2 is still successfully multicasting -- it's just that its gmond went silent. At least, that's a guess... Is there something else that I should check? ---------------------------------------------------------------------- Comment By: Jeff Squyres (jsquyres) Date: 2002-12-04 16:24 Message: Logged In: YES user_id=11722 Hi Mason -- thanks for the reply. Please re-read my initial description and my replies: after an initial failure (i.e., one-and-only-one ping), if I restart the gmond's, ganglia works fine (i.e., lots of pings on a regular basis). I personally have only tested under vmware, but others have tested on real clusters and are seeing the same behavior. I'm not much of a multicast programmer (read: not at all). Is there some kind of canonical test program that I can use? In the mean time, I'll check and see what tcpdump is saying about multicast behavior and get back to you. ---------------------------------------------------------------------- Comment By: Mason Katz (masonkatz) Date: 2002-12-04 16:18 Message: Logged In: YES user_id=463741 We've seen several switches that allow only the first multicast message out of the switch and then drop everything else because IGMP is not configured by default. Unfortunately getting this right is different on all switches, but we have yet to find a switch that doesn't work. If this is just a VMWare cluster problem then first build a real cluster and make sure the software works for you the way everyone else uses it. Then figure out what the VMWare virtual network layer is actually doing. Is this only a problem on virtual machines all on the same physical host? Is this a problem with a real cluster on nodes all running a single session of VMWare? I know VMWare has several different network models. Maybe you've just selected the wrong one. It's tricky since the multicast support probably needs to be in the VMWare loop-back layer. If it isn't the switch your machine is connected to will need to be configured correctly. Maybe this is the problem, are you build virtual clusters with out a physical network? I'd suggest you write a simple multicast client/sever application and debug the multicast layer in VMWare. Multicast is a standard and even really bad switches should just revert to broadcast. Lastly, what does tcpdump show is going on? -mjk (only builds physical clusters) ---------------------------------------------------------------------- Comment By: Jeff Squyres (jsquyres) Date: 2002-12-02 18:29 Message: Logged In: YES user_id=11722 Please re-read my initial description -- I am using vmware. There have been reliable reports of others using real machines that run into the same problems. I'm quite sure that they were all using multicast capable switches. But even if the switch was the problem, wouldn't ganglia consistently fail? As opposed to inconsistent behavior? ---------------------------------------------------------------------- Comment By: Federico David Sacerdoti (sacerdoti) Date: 2002-12-02 11:24 Message: Logged In: YES user_id=581045 I hear what you are saying about the intermittant failure. However, if you could positively verify that your ethernet switch does support IP multicast and IGMP, and that those features are turned on, it would help us come up with further tests. Run a crossover cable between a frontend and a compute node, so as to bypass the switch, and test again. Do you see the same problem? I dont believe it is a ganglia software problem, as we here at SDSC have 10+ clusters running ganglia 2.5.1 correctly. We have never seen this problem except during initial cluster setup, and the culprit was always the switch. Good luck, Federico ---------------------------------------------------------------------- Comment By: Jeff Squyres (jsquyres) Date: 2002-12-01 23:18 Message: Logged In: YES user_id=11722 It does not explain, however, why *sometimes* it works, and sometimes it doesn't. For example: after a reboot (where the gmond's are automatically started), we see the one-and-only-one ping behavior. But if I go manually restart the gmonds, all is well (i.e., pings happen continuously and regularly). Any other ideas? ---------------------------------------------------------------------- Comment By: Federico David Sacerdoti (sacerdoti) Date: 2002-11-30 20:12 Message: Logged In: YES user_id=581045 It sounds like a problem we have seen before. In our case, the ethernet switch did not support multicast and the IGMP protocol. It would stifle all multicast packets except the first one, and we would see the same behavior you describe. Hope this helps, Federico ---------------------------------------------------------------------- Comment By: Jeff Squyres (jsquyres) Date: 2002-11-29 10:21 Message: Logged In: YES user_id=11722 It's been about 2 weeks -- does anyone have any insight on this problem? Thanks. ---------------------------------------------------------------------- Comment By: Jeff Squyres (jsquyres) Date: 2002-11-16 18:00 Message: Logged In: YES user_id=11722 Correction: this is actually ganglia 2.5.1, not 2.5.0. Sorry for the confusion. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=434892&aid=639482&group_id=43021 |