1. Summary
  2. Files
  3. Support
  4. Report Spam
  5. Create account
  6. Log in

Cbench is a framework for benchmarking, stress-testing, and analyzing Linux-based parallel compute clusters

The Cbench project grew gradually from the frustration of having to redo the same work each time a new cluster was being integrated and brought on-line.

Using the Cbench toolkit to assist with the running of various tests, system administrators are better able to focus their efforts on testing and debugging the system during integration and maintenance.




What can we do with Cbench?

Cbench is utilized in a number of areas of HPC at Sandia National Labs:

  • interconnect performance testing and analysis
    • multiple bandwidth, latency, and collective tests
  • scalability testing and analysis
    • common benchmarks like Linpack, HPCC, NAS Parallel Benchmarks, Intel MPI Benchmarks, IOR
  • file system stress tests using a mix of job sizes and flavors
    • bonnie++, IOzone
  • stressing the system with 10s to 1000s of jobs of various sizes and flavors (typically after maintenance or upgrades)
    • synthetic benchmarks and real-world apps
  • scheduler and resource manager testing
    • launching multitudes of scaling jobs over a set of nodes
  • testing nodes for statistical variation in performance compared to the performance profile of the cluster
  • creating a basis for a deterministic methodology strictly detailing the testing process for returning broken hardware back into general production usage


Project Goals

The overarching goal is to make dealing with cluster testing much more tractable, which enables you to focus on what you really want to focus on, not on how to get there.

  1. Make it easy to build all the source code gathered within Cbench and make getting bootstrapped on a new cluster as easy as possible.
    • Cbench tries to do this by centralizing as many configuration parameters as possible, including Make parameters and cluster definition parameters.
  1. Be able to quickly generate, run a large number of tests, and analyze the test data. To this end, Cbench uses the idea of 'test sets' which package up tests/benchmarks/applications into sets with utilities to:
    • generate jobs for batch and interactive execution
    • run many batch and interactive jobs easily
    • analyze the resulting mass of output and synthesize it
  1. Make it completely painless to switch between different batch environments and different job launch methods (i.e. mpiexec, prun, yod, etc...) when running tests.


What Cbench IS

  • a Perl-based scripting framework for building, running, and analyzing the output of various open source codes
  • a highly useful toolkit for stressing a system for maintenance or acceptance testing
  • an easy way to run benchmarks and analyze a cluster using any of a variety of tests
  • a project created by HPC system administrators and engineers at Sandia National Labs
  • a toolkit for flexible Linux cluster testing, from pre-acceptance testing to post-repair validation


What Cbench IS NOT

  • a benchmarking program - Cbench allows you to run benchmarking programs (in addition to utilities and applications) in an easy manner. However, Cbench does not actually benchmark anything itself.
  • a well-polished user application - Cbench is developed by system engineers for use by system engineers. Its use requires a fair amount of knowledge about specifications of the cluster and runtime environment. While not overly complicated, Cbench isn't something we recommend for use by 'normal' users.

Some examples that are relatively painless to do with Cbench:

  • run 1000+ of parallel jobs over a range of 2 to 1024 processors using 10-15 different tests/benchmarks overnight and, the next morning, analyze what happened in minutes
  • plot results in semi real-time (using gnuplot) for supported tests as hundreds of jobs run through the scheduler
  • for a set of supported test jobs, easily analyze success/failure ratios according to job size (number of processes)
  • run node-level nightly burn-in tests (i.e. on a single node without worrying about MPI) on a 4500 node cluster, analyze the results, and generate a statistic performance and fault profile for the nodes
  • switch MPI job launchers and batch schedulers without difficulty


Cluster Testing/Benchmarking System Levels

We have observed that Linux cluster testing/benchmarking seems to fall into three different levels. Each level requires different but interrelated sets of tool.

Node Level The foundational level is the Node level. At this level, one is only concerned with testing a node in isolation without worrying about high-speed interconnects or system MPI libraries, etc. Hardware burn-in testing during integration is one example of this process. Since you will often be running this test on a cluster, you will probably want to test many isolated nodes in parallel. Some of the complexities at this level are:

  • having a flexible set of tests to run on nodes
  • having the ability to run tests on many nodes in parallel and organize the results
  • analyzing the results from the tests as they are generated and comparing the results among the set of nodes running tests
  • having the ability to quickly change the output parsing algorithms when new errors, conditions, data points, etc. are discovered to be of importance in test output
  • comparing results of previous runs with current runs
  • statistically characterizing the results to help focus on what really needs to be scrutinized for errors and aberrations in performance

Cbench has growing support for addressing testing at this level in the Nodehwtest Testset and currently has capabilities to address all of the complexities.

Point-to-Point Level The middle level is point-to-point system testing between nodes. This does not necessarily require utilizing any system MPI libraries or the high-speed interconnect. This level of testing can encompass anything from point-to-point netperf testing between all nodes (to test out Ethernet links) or point-to-point Verbs layer InfiniBand? testing (to test out IB network links below MPI). Cbench does not currently specifically address this area of testing... yet!

MPI System Level The top of the stack is the system level normally associated with Linux compute clusters, i.e. the MPI system level where parallel compute jobs run. At this level, one is concerned with testing the interaction of all the other system levels combined. As anyone who has been involved in cluster integration, testing, and or characterization knows, there are many complexities in dealing with testing at this level. Some of the complexities are:

  • dealing with all the combinations of
    1. a set of tests/benchmarks to run
    2. a range of process counts to test on, i.e. 2,4,8,....1024
    3. 1 process/node vs 2 process/node, etc.
  • running tests in an interactive mode versus a batch mode
  • different batch schedulers
  • different job launchers (i.e. mpiexec, mpirun, prun,...)
  • different testing iterations or testing purposes from which to categorize results
  • dealing with all the raw output in any sane way!

Cbench attempts to greatly simplify dealing with these complexities in several ways:

  • centralize as much configuration as possible including Make parameters (make.def) and cluster definition parameters (cluster.def)
  • provide the scripting infrastructure to easily generate/regenerate all the combinations of jobs desired at anytime
  • support different batch schedulers and job launchers in a core library, and easily switch between them by changing the appropriate values in cluster.def and regenerating job files
  • utilize the "testset" concept and structure
  • provide a modular output parsing structure to make it as easy as possible to add/change parsing logic for tests and still utilize the core output parsing analysis capabilities

News

  • 01-14-2009: Tagged and released Cbench 1.2.2!
  • 03-27-2009: Tagged and released Cbench 1.2.1
    • The Changelog can be found here and here.
    • The complete revision history can be found here.
    • A complete history of changes in the Cbench Openapps tree since Cbench 1.2.0 up through Cbench 1.2.1 can be found here.
  • 03-25-2009: Migrated the live Cbench Trac site 100% to Sourceforge! We are now using the excellent Sourceforge Hosted Apps setup. Thanks to Ryan for all his help in migrating much of the Cbench wiki data into the new site.
  • 10-20-2008: Tagged and released Cbench 1.2.0 !
  • 09-02-2008: Tagged the 1.2.0rc2 release and made a tarball available here. The tarball is also on the Sourceforge download system. You can also checkout the tag:
    svn co https://cbench.svn.sourceforge.net/svnroot/cbench/tags/cbench-1_2_0rc2 cbench
    
  • 08-13-2008: Tagged the 1.2.0rc1 release and made a tarball available here. The tarball has not been added to the Sourceforge mirrors yet due to their datacenter migration going on. In the meantime, you can just checkout the tag:
    svn co https://cbench.svn.sourceforge.net/svnroot/cbench/tags/cbench-1_2_0rc1 cbench
    
  • 03-10-2008: Started migrating to use the cbench Sourceforge project web server instead of the cbench-sf Sourceforge project web server, which is what we have been using. Two major changes have been accomplished:
  • 12-21-2007: Cbench version 1.1.5 released and pushed out to Sourceforge, release notes can be found here.
  • 11-19-2007: Preparing to freeze and release Cbench 1.1.5 very soon...
  • 11-12-2007: Cbench received an honorary mention in an HPCwire article about Woven Systems and Chelsio Communications Scalable High Performance 10 Gig Ethernet for Computing Clusters which can be found here and here and a Woven press release here


Downloads

Mailing Lists

Documentation and Resources

Interesting Related Work by Others

  • Linux Cluster Production Readiness from Egan Ford

Related Projects


Affiliated Organizations



NON-Affiliated (but cool) Organizations

England National Team