Cbench Overview
Cbench is a framework for using various tests, benchmarks, applications, and utilities to stress and analyze *nix-based parallel compute clusters.
The Cbench project grew gradually from the frustration of having to redo the same work each time a new cluster was being integrated and brought online. Using the Cbench toolkit to assist with the running of various tests, system administrators are better able to focus their efforts on testing and debugging the system during integration and maintenance.
Cbench is utilized in a number of areas on clusters at Sandia National Labs:
- cluster interconnect performance testing and analysis
- multiple bandwidth, latency, and collective tests
- cluster scalability testing and analysis
- common benchmarks like Linpack, HPCC, NAS Parallel Benchmarks, Intel MPI Benchmarks, IOR
- cluster file system stress tests using a mix of job sizes and flavors
- bonnie++, IOzone
- after maintenance, stressing the system with 10s to 1000s of jobs of various sizes and flavors
- synthetic benchmarks and real-world apps
- cluster scheduler and resource manager testing
- launching multitudes of scaling jobs over a set of nodes
- testing nodes for statistical variation in performance compared to the performance profile of the cluster
- it is the basis for a deterministic methodology strictly detailing the testing process for returning broken hardware back into general production usage
What Cbench IS
- a Perl-based scripting framework for building, running, and analyzing the output of various opensource codes
- a highly useful toolkit for stressing a system for maintenance or acceptance testing
- an easy way to benchmark and analyze a cluster using any of a variety of tests
- a project created by HPC system administrators and engineers at Sandia National Labs
- as a toolkit, there are many ways that Cbench can be utilized for many different Linux cluster testing tasks
What Cbench IS NOT
- a benchmarking program - Cbench allows you to run benchmarking programs (in addition to utilities and applications) in an easy manner. However, Cbench does not actually benchmark anything itself.
- a well-polished user application - Cbench is developed by system engineers for use by system engineers. Its use requires a fair amount of knowledge about specifications of the cluster and runtime environment. While not overly complicated, Cbench isn't something we recommend for use by 'normal' users.
Goals
The overarching goal is to make dealing with cluster testing much more tractable, which enables you to focus on what you really want to focus on, not on how to get there.
- Make it easy to build all the source code gathered within Cbench and make getting bootstrapped on a new cluster as easy as possible.
- Cbench tries to do this by centralizing as many configuration parameters as possible, including Make parameters and cluster definition parameters.
- Be able to quickly generate, run a large number of tests, and analyze the test data. To this end, Cbench uses the idea of 'test sets' which package up tests/benchmarks/applications into sets with utilities to:
- generate jobs for batch and interactive execution
- run many batch and interactive jobs easily
- analyze the resulting mass of output and synthesize it
- Make it completely painless to switch between different batch environments and different job launch methods (i.e. mpiexec, prun, yod, etc...) when running tests.
Some examples that are relatively painless to do with Cbench:
- run 1000+ of parallel jobs over a range of 2 to 1024 processors using 10-15 different tests/benchmarks overnight and, the next morning, analyze what happened in minutes
- plot results in semi real-time (using gnuplot) for supported tests as hundreds of jobs run through the scheduler
- for a set of supported test jobs, easily analyze success/failure ratios according to job size (number of processes)
- run node-level nightly burn-in tests (i.e. on a single node w/o worrying about MPI) on a 4500 node cluster, analyze the results, and generate a statistic performance and fault profile for the nodes
- switch mpi job launchers and batch schedulers without difficulty
Cluster Testing/Benchmarking System Levels
We have observed that Linux cluster testing/benchmarking seems to fall into three different levels. Each level requires different but interrelated toolsets.
Node Level
The foundational level is the Node level. At this level, one is only concerned with testing a node in isolation without worrying about high-speed interconnects or system MPI libraries, etc. Hardware burnin testing during integration is one example of this process. Since you will often be running this test on a cluster, you will probably want to test many isolated nodes in parallel. Some of the complexities at this level are:
- having a flexible set of tests to run on nodes
- having the ability to run tests on many nodes in parallel and organize the results
- analyzing the results from the tests as they are generated and comparing the results among the set of nodes running tests
- having the ability to quickly change the output parsing algorithms when new errors, conditions, data points, etc. are discovered to be of importance in test output
- comparing results of previous runs with current runs
- statistically characterizing the results to help focus on what really needs to be scrutinized for errors and aberrations in performance
Cbench has growing support for addressing testing at this level in the Nodehwtest Testset and currently has capabilities to address all of the complexities.
Point-to-Point Level
The middle level is point-to-point system testing between nodes. This does not necessarily require utilizing any system MPI libraries or the high-speed interconnect. This level of testing can encompass anything from point-to-point netperf testing between all nodes (to test out ethernet links) or point-to-point Verbs layer Infiniband testing (to test out IB network links below MPI). Cbench does not currently specifically address this area of testing... yet!
MPI System Level
The top of the stack is the system level normally associated with Linux compute clusters, i.e. the MPI system level where parallel compute jobs run. At this level, one is concerned with testing the interaction of all the other system levels combined. As anyone who has been involved in cluster integration, testing, and or characterization knows, there are many complexities in dealing with testing at this level. Some of the complexities are:
- dealing with all the combinations of
- a set of tests/benchmarks to run
- a range of process counts to test on, i.e. 2,4,8,....1024
- 1 process/node vs 2 process/node, etc.
- running tests in an interactive mode versus a batch mode
- different batch schedulers
- different job launchers (i.e. mpiexec, mpirun, prun,...)
- different testing iterations or testing purposes from which to categorize results
- dealing with all the raw output in any sane way!
Cbench attempts to greatly simplify dealing with these complexities in several ways:
- centralize as much configuration as possible including Make parameters (make.def) and cluster definition parameters (cluster.def)
- provide the scripting infrastructure to easily generate/regenerate all the combinations of jobs desired at anytime
- support different batch schedulers and job launchers in a core library, and easily switch between them by changing the appropriate values in cluster.def and regenerating job files
- utilize the "testset" concept and structure
- provide a modular output parsing structure to make it as easy as possible to add/change parsing logic for tests and still utilize the core output parsing analysis capabilities
News
- 01-14-2009: Tagged and released Cbench 1.2.2!
- v1.2.2 Changelog
- Complete revision history here.
- Openapps revision history here
- 03-27-2009: Tagged and released Cbench 1.2.1
- 03-25-2009: Migrated the live Cbench Trac site 100% to Sourceforge! We are now using the excellent Sourceforge Hosted Apps setup. Thanks to Ryan for all his help in migrating much of the Cbench wiki data into the new site.
- 10-20-2008: Tagged and released Cbench 1.2.0 !
- 09-02-2008: Tagged the 1.2.0rc2 release and made a tarball available here. The tarball is also on the Sourceforge download system. You can also checkout the tag:
svn co https://cbench.svn.sourceforge.net/svnroot/cbench/tags/cbench-1_2_0rc2 cbench
- 08-13-2008: Tagged the 1.2.0rc1 release and made a tarball available here. The tarball has not been added to the Sourceforge mirrors yet due to their datacenter migration going on. In the meantime, you can just checkout the tag:
svn co https://cbench.svn.sourceforge.net/svnroot/cbench/tags/cbench-1_2_0rc1 cbench
- 03-10-2008: Started migrating to use the cbench Sourceforge project webserver instead of the cbench-sf Sourceforge project webserver, which is what we have been using. Two major changes have been accomplished:
- The Cbench Subversion repository is now hosted on Sourceforge!! The URL is https://cbench.svn.sourceforge.net/svnroot/cbench/trunk/cbench . Thanks to Chris for spearheading this!
- To switch checkout URLs see the Developer's Page
- The read-only mirror of the Cbench TRAC website was moved to the cbench webserver space and the cbench-sf webserver space is now just a pointer to http://cbench.sourceforge.net .
- We are still using TRAC hosted at http://cbench.sandia.gov as the live development site for Cbench (with no plans to ever leave TRAC)
- The Cbench Subversion repository is now hosted on Sourceforge!! The URL is https://cbench.svn.sourceforge.net/svnroot/cbench/trunk/cbench . Thanks to Chris for spearheading this!
- 12-21-2007: Cbench version 1.1.5 released and pushed out to Sourceforge, release notes can be found here.
- 11-19-2007: Preparing to freeze and release Cbench 1.1.5 very soon...
- 11-12-2007: Cbench received an honorary mention in an HPCwire article about Woven Systems and Chelsio Communications Scalable High Performance 10 Gig Ethernet for Computing Clusters which can be found here and here and a Woven press release here
Downloads
- Latest release: Cbench 1.2.2 (01-14-2010)
- Latest Release Changelog (latest Changelog on Sourceforge)
- Previous release: Cbench 1.2.1 (03-27-2009)
- Last Cbench 1.1.X release: Cbench 1.1.5 (12-21-2007)
- All Cbench downloads
- Licensing
Mailing Lists
- One size fits all list:
Documentation and Resources
- Documentation
- Whitepaper on cluster testing and Cbench
- Samples and Examples of Cbench at work
- Single node benchmarking reports
- dual socket, single core/socket server: PDF report, raw data
- Single node benchmarking reports
- Developers Home Page
- Cbench Sourceforge project homepage
- Information on Sandia National Labs Thunderbird Cluster on which Cbench is used heavily





