Integrating Ganglia with rrdcached
From the rrdcached manpage, rrdcached is "a daemon that receives updates to existing RRD files, accumulates them and, if enough have been received or a defined time has passed, writes the updates to the RRD file.".
rrdcached is useful in environments where Ganglia is used to monitor a lot of servers (1000+) and/or a lot of metrics. In these large environments, if the rrd files are stored in traditional hard disks, then the server will experience high I/O wait resulting from the constant updating of a large number of rrd files. The commonly accepted workaround is to store these rrd files in tmpfs which eliminates the issue. The down side is that these rrd files need to be manually backed up regularly to prevent data loss if the server crashes or shuts down.
rrdcached in general can prevent the I/O storm caused by updates of large number of rrd files by staggering the writes to disk and in practise has been quite successful in lowering the I/O wait of the servers running gmetad/web frontend.
Setting it up
To integrate Ganglia with rrdcached, you will need at least version 3.1.7 of Ganglia and 1.4 of RRDtool. Older versions of Ganglia will also work, but you will need to modify some code manually. That is outside the scope of this document.
If you are using a pre-packaged version (eg. RPM, deb) of Ganglia, make sure that it was linked against RRDtool 1.4 or above.
Starting up rrdcached
For Ganglia, two processes need to access the rrdcached daemon, namely gmetad and apache (or whatever webserver you are running). In a typical setup, the gmetad process is owned by nobody (or in some cases ganglia) and the apache process is owned by apache. When you start the rrdcached daemon, you will need to make sure the UNIX Domain Socket is both read and writeable by the above mentioned processes.
Add nobody to apache's group in /etc/group, then start rrdcached with the following command:
# su nobody -c 'rrdcached -p /tmp/rrdcached.pid \ -s apache -m 664 -l unix:/tmp/rrdcached.sock \ -s nogroup -m 777 -P FLUSH,STATS,HELP -l unix:/tmp/rrdcached.limited.sock \ -b /var/lib/ganglia/rrds -B' -s /bin/sh
This will start rrdcached as the user nobody, writable by apache group with permissions of 644 and placing the pid file in /tmp. The default UNIX Domain Socket rrdached.sock will be created in /tmp as well. An additional "limited" socket will be created at /tmp/rrdcached.limited.sock; this socket is the one you should point the ganglia-webfrontend cgis to. This socket is only usable for the FLUSH, STATS, and HELP commands and prevent errant web apps from writing arbitrary rrds. The -b option specifies the base directory to change to and -B restricts file access to the paths within the directory specified by -b.
If you would like to start rrdcached using the init script provided, modify the rrdcached daemon user (--user) in /etc/init.d/rrdcached and place the rrdcached options in /etc/sysconfig/rrdcached.
Next, you will need to configure Ganglia to talk to rrdcached. You will need to make two changes, one related to the gmetad init script and one in the frontend's conf.php.
For this example we will be referring to the init script for Red Hat and its derivatives. The init script should source the file /etc/sysconfig/gmetad. You will need to edit the file and update the RRDCACHED_ADDRESS variable to point to the location of your rrdcached socket.
For other distributions, just make sure that prior to calling gmetad, the variable RRDCACHED_ADDRESS contains the socket location and is exported. For Debian/Ubuntu packages, the init script may source the file /etc/default/gmetad instead.
To configure the frontend, just update the $rrdcached_socket variable in conf.php and you're done
Now when you startup gmetad, all RRDtool commands will go through the caching daemon. Same goes to the graphing functions in the web frontend.
If you have enabled gmetad and rrdcached to startup by default, make sure that rrdcached starts before gmetad and gmetad stops first before rrdcached.
For more information regarding rrdcached, including additional security policies that could be put in place, please refer to the rrdcached manpage.