Master/node to gather and graph "everything" on your systems using Tobi Oetiker's rrdtool. It can optionally warn your surveillance software. This software package was originally called LRRD. The project. Please see http://munin-monitoring.org/
Scalable, distributed monitoring system for high-performance computing
Ganglia is a scalable distributed monitoring system for high-performance computing systems such as clusters and Grids. It is based on a hierarchical design targeted at federations of clusters. Supports clusters up to 2000 nodes in size.