1. Summary
  2. Files
  3. Support
  4. Report Spam
  5. Create account
  6. Log in

Ganglia

From xcat

Jump to: navigation, search

Contents

Ganglia Installation Notes

This HOWTO will cover installing the base Ganglia on to a system running xCAT that uses stateless images. Please update it if you find errors. For this example we're using RedHat 5.x. Also, this is just a simple example. You can get better documentation at the Ganglia site here. This is just a quick howto where we install Ganglia on the management node and then on stateless images.

Install Prereqs

yum -y install apr-devel apr-util check-devel cairo-devel pango-devel
wget \ ftp://ftp.muug.mb.ca/mirror/fedora/linux/releases/9/Everything/source/SRPMS/libconfuse-2.6-1.fc9.src.rpm
rpm -ivh libconfuse-2.6-1.fc9.src.rpm
cd /usr/src/redhat/SPECS
rpmbuild -ba x86_64 libconfuse.spec
cd ../RPMS/x86_64/
rpm -ivh libconfuse-devel-2.6-1.x86_64.rpm libconfuse-2.6-1.x86_64.rpm

Get and Install RRDTool

This example will use all the defaults to install RRDTool. RRDTool is the round robin database engine behind Ganglia and also the way Ganglia plots graphs.

cd /install/packages/
wget http://oss.oetiker.ch/rrdtool/pub/rrdtool.tar.gz
tar zxvf rrdtool*
cd rrdtool-*
./configure
make -j8
make install
which rrdtool
echo "/usr/local/rrdtool-1.3.4/lib" >/etc/ld.so.conf.d/rrd.conf
ldconfig

Get and Install Ganglia

Get it here. Download the ganglia-3.1.1.tar.gz file and place it in /install/packages

cd /install/packages/
tar zxvf ganglia*tgz
cd ganglia-3.1.1/
./configure --with-gmetad
make -j8
make install
cp -a web /var/www/html/ganglia/
cp gmetad/gmetad.init /etc/rc.d/init.d/gmetad
mkdir /etc/ganglia
gmond -t | tee /etc/ganglia/gmond.conf
cp gmetad/gmetad.conf /etc/ganglia/
mkdir -p /var/lib/ganglia/rrds
chown nobody:nobody /var/lib/ganglia/rrds
chkconfig --add gmetad
chkconfig --add gmond

Configure Ganglia on the Management Node

Edit /etc/ganglia/gmond.conf so that:

name = "unspecified"

becomes

name = "yourclustername"

Then, we need to allow support for the python modules. Add the following in the modules { } stanza:

modules {
  #... a bunch of modules
  module {
    name = "python_module"
    path = "modpython.so"
    params = "/usr/lib64/ganglia/python_modules/"
  }
}
include ('/etc/ganglia/conf.d/*.conf')
include ('/etc/ganglia/conf.d/*.pyconf')

Change /var/www/html/ganglia/conf.php so that

define("RRDTOOL", "/usr/bin/rrdtool");

becomes

define("RRDTOOL", "/usr/local/bin/rrdtool");

On my cluster, eth1 on the management server is the interface that connects to the compute nodes. Therefore, to make it so that my gmond traffic goes out on the same broadcast domain as the compute nodes I did this:

route add -host 239.2.11.71 dev eth1

(You should use the same 239.2.11.71 unless you changed it in the gmond.conf file, but attach it to whatever interface is connected to the nodes.)

To permanently add this route, so that it is available upon reboot, create the file /etc/sysconfig/network-scripts/route-<your ethernet>. (I wanted to bind Ganglia to eth1, so I created /etc/sysconfig/network-scripts/route-eth1). Then add the contents:

239.2.11.71 dev eth1

At this point we can now restart everything and look at the web page and see our management node:

service gmond start
service gmetad start
service httpd restart

Now pull up a web browser and look at the management node:
http://localhost/ganglia

If you followed the instructions... (or if I documented it correctly) then you should now see the graphs of your cluster and the management node. Nice work! Now lets install the compute nodes.

Compute Nodes (Quick and Dirty method)

For compute nodes, we just need to copy the files on to them and it will work fine. Assuming you have a compute group, just run the following:

pscp /usr/sbin/gmond compute:/usr/sbin/gmond
psh compute mkdir -p /etc/ganglia/
pscp /etc/ganglia/gmond.conf compute:/etc/ganglia/
pscp /etc/init.d/gmond compute:/etc/init.d/
pscp /usr/lib64/libganglia-3.1.1.so.0 compute:/usr/lib64/
pscp /lib64/libexpat.so.0 compute:/lib64/
pscp /usr/lib64/libconfuse.so.0 compute:/usr/lib64/
pscp /usr/lib64/libapr-1.so.0 compute:/usr/lib64/
pscp -r /usr/lib64/ganglia compute:/usr/lib64/
psh compute service gmond start

Now if you restart gmetad or refresh your web browser on the management node you should see all the nodes in it.

Compute Nodes (Stateless more permanent method)

This is basically the same as the other method. But here, we put those files in the stateless image. My stateless image is called 'compute'. In this example we've already ran genimage to create the base stateless image. So to add Ganglia to the existing stateless image I do the following:

export IMGROOT=/install/netboot/rhels5.2/x86_64/compute/rootimg
echo $IMGROOT
cp /usr/sbin/gmond $IMGROOT/usr/sbin/gmond
mkdir -p $IMGROOT/etc/ganglia/
cp /etc/ganglia/gmond.conf $IMGROOT/etc/ganglia/
cp /etc/init.d/gmond $IMGROOT/etc/init.d/
cp /usr/lib64/libganglia-3.1.1.so.0 $IMGROOT/usr/lib64/
cp /lib64/libexpat.so.0 $IMGROOT/lib64/
cp /usr/lib64/libconfuse.so.0 $IMGROOT/usr/lib64/
cp /usr/lib64/libapr-1.so.0 $IMGROOT/usr/lib64/
cp -a /usr/lib64/ganglia $IMGROOT/usr/lib64/
chroot $IMGROOT chkconfig --add gmond

Now I pack the image up and deploy it to nodes:

packimage -p compute -a x86_64 -o rhels5.2
nodeset compute netboot
rpower compute boot

Now when these nodes come up they're all set and you should see them in your Ganglia web page.

Add rvitals to Ganglia monitoring

You can add rvitals to Ganglia monitoring by using the spoofing mechanisms of gmetric. This is a two step process: 1. Use a script to run rvitals and 2. Use a cron job to run it however often you feel you should. The following script can be saved as /opt/xcat/share/xcat/scripts/xcat-gmetric.pl

#!/usr/bin/perl
use strict;
use Socket;

my $nr = shift;
if($nr eq ""){
        print "please supply an xCAT noderange as an argument\n";
        exit 1;
}
#if($ENV{XCATROOT} eq ""){
#       print "XCATROOT is not defined in environment\n";
#       print "Is xCAT installed?\n";
#       exit 1;
#}

my %vitals = (
        temp => ["Celsius", "int16"],
        voltage =>  ["Voltage", "float"],
        fanspeed => ["RPM", "float"]
);

foreach my $k (keys %vitals){
        foreach my $i (`rvitals $nr $k`) {
                my $ip;
                chomp $i;
                my($host,$desc,$val) = split(": ", $i);
                $val = (split(" ", $val))[0];

                my $packet_ip = gethostbyname($host);
                if(defined $packet_ip){
                        $ip = inet_ntoa($packet_ip);
                }else{
                        print "Could not get IP for $host\n";
                        next;
                }
                my $type = $vitals{$k}->[1];
                my $unit = $vitals{$k}->[0];
                my $cmd = "gmetric -n '$desc' -v $val -t $type -u $unit -S $ip:$host";
                `$cmd`;
        }
}

Then you can create a contab that will run it as often as you like. To run it every minute run:

crontab -e

Then add the following line to it:

* * * * * /opt/xcat/share/xcat/scripts/xcat-gmetric.pl compute

That will run the script every minute and run IPMI commands on the compute group. You can use the xCAT noderange options to run on different compute groups. Once the cron job is running you will start to see IPMI entries in the Ganglia graph. Mine looks kind of like this:

image:Xcat-ganglia.png

...

Conclusion

Hopefully this helps you get started with gmond. I would encourage other readers to add their customizations and send to the xCAT mailing list or me directly so I can post them: vallard AT benincosa.com

Personal tools