Frequently asked questions about Ganglia
What is the difference between unicast and multicast modes, and in what situation should I use them?
Multicast mode is the default setting and is the simplest to setup and also provides redundancy. Environments that are sensitive to "jitter" may consider setting up Ganglia in unicast mode, which significantly reduces the chatter but is a bit more complex to configure. Environments such as Amazon's AWS EC2 do not support multicast, so unicast is the only setup option available.
My graphs are generated, but some/all have no data
Make sure that the time on the gmond node is correct and matches the gmetad that is collecting them. Consider using NTP on these machines to keep time synchronized.
Sometimes metric graphs don't show up for hosts. I'm using unicast mode; Or
When using unicast, restarting the collector gmond causes metric graphs to disappear
In previous versions of ganglia, the monitoring daemon gmond would send metadata only when it started. When using a unicast configuration, this behavior can cause all metric graphs to disappear from the host-view page if the collecting gmond is restarted. The hosts will appear up, but no data will appear on the host-view page, and the CPU counts will be off as well. Restarting all of the non-collector gmond daemons will make the metric graphs reappear, however this may not be feasible for large clusters.
In recent versions of gmond (3.1.x), a new global variable was added in gmond.conf called 'send_metadata_interval', with a default setting of 0. Purpose was to reduce network traffic. In 3.1 metric data is sent separately from metadata e.g. metadata contains detailed description, grouping, other possible setting. A value of zero means that the gmond will send metadata when it starts, and no other time (which is consistent with older versions of ganglia).
If you plan on using unicast mode, please set "send_metadata_interval" to something other than 0. 30-60 seconds has been found to work reliably in most cases. Setting this variable to a non-zero value will make the gmond processes periodically announce their metrics and the graphs will reappear on the host-view page.
My web interface now says 'fsock: unable to open...'
Your conf.php file has disappeared from your webroot. Restore from a backup or retrieve out of a ganglia tarball (be sure to make -C web conf.php version.php first)
None of this has helped, do you have some debugging tips?
- For gmond:
- See if the gmond service is running, issue the ps aux|grep gmond command.
- Stop the gmond service and run it by hand with debug mode. /etc/init.d/gmond stop; /usr/sbin/gmond -d 2. Look for errors near the top.
- Attempt to retrieve the XML data by netcatting to the gmond daemon. nc <hostname> 8649
- Confirm that UDP connections can be established between the gmetad and gmond(or gmond and other gmond's for multicast purposes) by running nc -u -l 8653 on the host in question, then echo "hello"|nc -u <hostname> 8653 from the gmetad or another gmond.
- Check gmond data using /usr/bin/gstat -a
- For gmetad:
- See if the gmetad service is running, issue the ps aux|grep gmetad command.
- Check syslog for errors. tail /var/log/messages
- Stop the gmetad service and run it by hand with debug mode. /etc/init.d/gmetad stop; /usr/sbin/gmetad -d 2. Look for errors near the top.
- Ensure that /var/lib/ganglia and it's children are owned and writable by the nobody user (ganglia user on Debian/Ubuntu).
- Retrieve the XML data by netcatting to the gmetad daemon. nc <hostname> 8650. This information is useful for submitting bug reports.
- For the web interface:
- Monitor the web server error log. PHP errors will appear here. tail -f /var/log/apache2/error_log.
- Ensure that the settings in conf.php are correct. If you are installing from source, don't just copy the web/ folder and rename conf.php.in, and version.php.in, they both have variables in them that need to be set. Run make -C web conf.php version.php or fill in the variables by hand (there are only 2, and both are enclosed by @'s).