From: Nicholas H. <he...@se...> - 2003-11-13 22:01:23
|
On a sourceforge mirror near you: Name : Clubmask Version : 0.6 Release : b1 Group : Cluster Resource Management and Scheduling Vendor : Liniac Project, University of Pennsylvania License : GPL-2 URL : http://clubmask.sourceforge.net Download : http://sourceforge.net/project/showfiles.php?group_id=1316&release_id=197383 What is Clubmask ------------------------------------------------------------------------------ Clubmask is a resource manager designed to allow Bproc based clusters enjoy the full scheduling power and configuration of the Maui HPC Scheduler. Clubmask uses a modified version of the Supermon resource monitoring software to gather resource information from the cluster nodes. This information is combined with job submission data and delivered to the Maui scheduler. Maui issues job control commands back to Clubmask, which then starts or stops the job scripts using the Bproc environment. Clubmask also provides builtin support for a supermon2ganglia translator that allows a standard Ganlgia web backend to contact supermon and get XML data that will disply through the Ganglia web interface. Clubmask is currently running on around 10 clusters, varying in size from 8 to 128 nodes, and has been tested up to 5000 jobs. Notes/warnings on this release: ------------------------------------------------------------------------------ Before upgrading, please make sure to save your /etc/clubmask/clubmask.conf file, as it may get overwritten. There are a few new variables in clubmask.conf, so beware! To use the resource requests, you must be running the latest snapshot of maui. Changes since 0.5: ------------------------------------------------------------------------------ Change the name from the god awfull absolute timestamp, to a more normal "string.number" format, where "string" is an arbitrary job name and "number" is the Nth time that the job name is being used. EX root.1, root.2, ... fix cmnodesshknownhosts to get the -n information from the bproc nodenumber that is given as the argument update to latest supermon APIs Feature Request #790938: add 'cmsubmit -r <resid>' to run a job in a maui reservation. Fixed bug #791396: make sure processes get killed in Interactive jobs make sure bproc is running when starting resource_manager fix cmsubmit -h. it is now cleaner, and easier to understand add support for resource requirements on the nodes. swap, mem, disk, qos, reservation, and processors per node are supported now. see cmsumbit -h for more information. add infrastructure for architecture, os, network, arbitrary features as node resource requests. We do not get this information dynamically yet, so no need in letting people muck with it. add supermon_state daemon to manage the nodelist for supermon. keeps that logic out of resource_manager make sure there is at most one 'R' command in the pipeline for down nodes at any given time. No sense in asking nodes to revive if they have not responded to the last request yet. cleanup setup to perform RPM builds cleaner split /etc/clubmask/clubmask.conf to /etc/clubmask/{system,clubmask}.conf to allow variables that need user editing to live in clubmask.conf and the rest of the system varaibles to live in system.conf. This will let a user update to a newer version of Clubmask, and just copy over the old clubmask.conf to restore their configuration. migrate all docs from Docbook XML to Lyx/latex. All of the docs -- pdf, html single, and html multiple can be generated with a simple 'make' in the docs/ directory. add --secret-key to setup.py args for building maui and clubmask with same checksum key. This removes the need to edit setup.py when installing clubmask. Links ------------- Bproc: http://bproc.sourceforge.net Ganglia: http://ganglia.sourceforge.net Maui Scheduler: http://www.supercluster.org/maui Supermon: http://supermon.sourceforge.net Cheers~ Nic -- Nicholas Henke Penguin Herder & Linux Cluster System Programmer Liniac Project - Univ. of Pennsylvania |