Home / oscar / 2.1
Name Modified Size InfoDownloads / Week
Parent folder
README.ganglia 2003-01-22 4.5 kB
oscar-including-srpms-2.1.tar.gz 2002-12-14 72.4 MB
oscar-2.1.tar.gz 2002-12-14 45.8 MB
oscar-only-srpms-2.1.tar.gz 2002-12-14 26.6 MB
Totals: 4 Items   144.8 MB 0
$Id: README,v 1.8 2003/01/22 05:06:28 sad Exp $

This is the Ganglia package for OSCAR Linux clusters.

Ganglia was developed at the University of California, Berkeley
Computer Science Division as part of the ongoing clustering research
project named The Millennium Project (and its predecessor The NOW
Project).  It is being developed by Matt Massie
<massie@cs.berkeley.edu>, Brent Chun <bnc@caltech.edu>,
Steven Wagner <swagner@ilm.com>, Federico Sacerdoti <fds@sdsc.edu>,
and other active developers.

For additional information about the Ganglia project, see:

http://ganglia.sourceforge.net/

The OSCAR documentation provides directions on how to use the OSCAR
Package Downloader (scripts/opd) to obtain additional OSCAR packages
from the available OSCAR package repositories. Packages like ganglia,
clumon, pvm-povray, and others are currently provided. For additional
information on how to package other cluster tools see the OSCAR
documentation.

This tarball should be expanded in the OSCAR distribution tree under
the directory "oscar-<version>/packages/".  After that, you'll have a
directory named oscar-<version>/packages/ganglia/, with all the
Ganglia files in it.  If this is done prior to the start of the OSCAR
install process, the packages will be picked up by the OSCAR install
suite (SIS) and be present on the resulting installed cluster (both
the server and client nodes).

WARNING: Ganglia is heavily based around TCP/IP Multicast protocol.
(like it won''t work without a working multicast protocol stack).
Fortunately, the multicast support in Linux is excellent.
Unfortunately the quality of the multicast support built into various
network hubs, switches, routers, etc. can be hit or miss.  What this
means in practice is that you may or may not see some unusual issues
with the ganglia packages once they are up and running.  They can
include all or none of the following:

- Sometimes after installation, the compute nodes are marked "dead"
  when they are actually available, or after rebooting/adding a node.
  If this occurs, restarting the Ganglia daemons (gmond's) around the
  cluster seems to resolve the issue, e.g., 'cexec service gmond
  restart'. This issue has been traced to issues with resetting the system
  time on the compute/client nodes into the past. This usually occurs
  under the following conditions: vmware sessions (vmware doesn't handle
  system time settings correctly), NTP (network time protocol), or
  sys-admin resets it.  Even though this does not occur under normal
  circumstances on a working cluster, a patch has submitted been submitted
  to the ganglia developers to fix this minor issue.

- Even though the gmond daemon is running on all of the nodes (server
  and client nodes), you may only see the head node in the reported
  stats when pointing a web browser at the ganglia monitor web pages
  (http://localhost.head.node/ganglia/).  This means the multicast
  protocol may be blocked by the network hardware between the server
  node and the rest of the cluster.  If you see this, please check to
  make sure multicast is turned on in the configuration of your
  network hardware.

Feel free to post such problems to the oscar-users mailing list and we
will attempt to assist you in diagnosing the problem.  However at some
point in that diagnosis process we may be reduced to telling you to
"get another brand of hub/switch/router."  Just a guess: approximately
95% of the various network hubs/switches/routers do work fine but
there is always the chance you could be part of that lucky 5%.  :-)


Additional issues include:


- After installation, the Ganglia output can be found by pointing your
  web browser to (on the head node):

        http://localhost/ganglia/

- There is also a command line tool to access the ganglia data
  stream that is based on a python class contributed by the Rocks
  team at San Diego Supercomputer Center.  As detailed in the
  ganglia-user.pdf doc, this tool enables the user to build scripts
  around access to the ganglia data stream. As per the ganglia-user.pdf
  doc, the tool can then be found in /usr/sbin/ganglia.  Running this
  with no options gives a usage statement.

- See the Ganglia user-level documentation in the directory:

        packages/ganglia/doc/ganglia-user.pdf

---

Send questions, comments, and bug reports to the OSCAR user's list.
For details, see the main OSCAR web page:

        http://oscar.sourceforge.net/
Source: README.ganglia, updated 2003-01-22