From: <al...@cr...> - 2007-10-22 21:40:46
|
Second post today, separate topic... I've got a few machines set up as active/passive clusters running =20 heartbeat/drbd. I am currently monitoring them with ganglia, but I =20 think the information I'm getting leads to a misleading picture. Since both machines are monitored, it looks like I have 8 processors =20 in the cluster (4 each in 2 boxes). But in reality, only 1 of these =20 machines is ever available at 1 time. I am keeping a mental note to =20 myself that any time these clusters are more than 50% utilized, =20 they're really >100% utilized, since the CPUs, RAM, etc from the =20 passive node really shouldn't count in the totals. Always having to =20 drill down to the level of the individual machine to see what's going =20 on is kind of a pain. The only solution I've thought of is to keep gmond turned off on the =20 passive node, and starting it during a resource migration. This would =20 be easy enough, but it would have 2 drawbacks : 1. My stats would say 50% of my cluster is 'down' although it's =20 functioning correctly. 2. It is sometimes useful to monitor stuff on the passive node, and I =20 don't really want to lose that ability. Any better ways to do this? Maybe extend the PHP frontend to be =20 configurable for monitoring active/passive? (Would anyone else have a =20 use for that besides me?) thanks, alex |
From: richard g. <gr...@ds...> - 2007-10-23 02:03:44
|
Alex, They are the only 2 members of the cluster? How about this: - The gmond.conf on host A is configured unicast and to send data to the *physical* address (not the VIP) of Host B. Do not configure gmond.conf to send data to itself. The only UDP send channel is to host B - Configure the Host B gmond.conf in the above way to send its UDP data from host B to host A, and not to send to itself. - Configure gmetad.conf to poll the floating VIP address of the cluster. Now I have to say I am away from my computers, so this is a thought experiment. But at my previous company we did this for a failover pair of servers that were the headnodes of a larger cluster. It worked OK I think. I also seem to remember that the cygwin gmond behaves a little bit differently to the Linux one in the case where you do/dont want metrics sent to the host itself. Let me know if this helps. :-) regards, Richard Quoting al...@cr...: >>> Second post today, separate topic... >>> >>> I've got a few machines set up as active/passive clusters running >>> heartbeat/drbd. I am currently monitoring them with ganglia, but I >>> think the information I'm getting leads to a misleading picture. >>> >>> Since both machines are monitored, it looks like I have 8 processors >>> in the cluster (4 each in 2 boxes). But in reality, only 1 of these >>> machines is ever available at 1 time. I am keeping a mental note to >>> myself that any time these clusters are more than 50% utilized, >>> they're really >100% utilized, since the CPUs, RAM, etc from the >>> passive node really shouldn't count in the totals. Always having to >>> drill down to the level of the individual machine to see what's going >>> on is kind of a pain. >>> >>> The only solution I've thought of is to keep gmond turned off on the >>> passive node, and starting it during a resource migration. This would >>> be easy enough, but it would have 2 drawbacks : >>> 1. My stats would say 50% of my cluster is 'down' although it's >>> functioning correctly. >>> 2. It is sometimes useful to monitor stuff on the passive node, and I >>> don't really want to lose that ability. >>> >>> Any better ways to do this? Maybe extend the PHP frontend to be >>> configurable for monitoring active/passive? (Would anyone else have a >>> use for that besides me?) >>> >>> thanks, >>> alex >>> >>> ------------------------------------------------------------------------- >>> This SF.net email is sponsored by: Splunk Inc. >>> Still grepping through log files to find problems? Stop. >>> Now Search log events and configuration files using AJAX and a browser. >>> Download your FREE copy of Splunk now >> http://get.splunk.com/ >>> _______________________________________________ >>> Ganglia-general mailing list >>> Gan...@li... >>> https://lists.sourceforge.net/lists/listinfo/ganglia-general >>> -- |
From: richard g. <gr...@ds...> - 2007-10-23 12:44:46
|
Alex, oh dear, it looks like I answered the wrong question *again*. As I don't have test access to a running ganglia someone else should answer. But part of it may be to - - configure gmetad.conf to poll the failover VIP IP or DNS name, not the physical ones. - Configure each server in the failover pair to send UDP data to themselves. Don't use 127.0.0.1 and don't use the floating VIP. Use an IP address that stays with the server. - Now you have to arrange it so that the src address of the UDP being looped back maps to the same hostname string when doing a reverse lookup DNS on the gmond hosts. Note that the host names in the XML stream that goes to gmetad are generated by gmond doing a reverse DNS lookup on the src IPs of the UDP traffic. There are a few ways you may arrange this. kind regards, richard |