hpcs-io Code
Brought to you by:
garlough
Overview ======== These tests measure the performance of a file system and hardware configuration on the 14 IO Scenarios for the DARPA HPCS program. They are intended to provide simplified test cases characterizing I/O utilized used by Mission Partners, and are used to demonstrate the scalability of the storage. The following fourteen (14) Scenarios were agreed upon by the HPCS Mission Partners: 1. Single stream with large data blocks operating in half duplex mode 2. Single stream with large data blocks operating in full duplex mode 3. Multiple streams with large data blocks operating in full duplex mode 4. Extreme file creation rates 5. Checkpoint/restart with large I/O requests 6. Checkpoint/restart with small I/O requests 7. Checkpoint/restart large file count per directory - large I/Os 8. Checkpoint/restart large file count per directory - small I/Os 9. Walking through directory trees 10. Parallel walking through directory trees 11. Random stat() system call to files in the file system - one (1) process 12. Random stat() system call to files in the file system - multiple processes 13. Small block random I/O to multiple files 14. Small block random I/O to a single file These scenarios are file based, use OPENMP for multi-threading across shared memory cores , and use MPI across distributed memory systems for the large scaling tests. It should be noted that ALL Scenarios shall be performed with a "Single Shared Name Space" filesystem, and several tests require they be run on CPUs with "cache coherent shared memory". Scripts ======== In the scripts directory is a pbs_prototype directory which has a VARIABLE file containing site specific variables that will modify the prototype files, by running "generate_scripts", creating the pbs scripts that will launch jobs for a Cray. It puts the pbs scripts in the subdirectory of pbs_prototype. These generated scripts should be considered a starting point, and the resulting files need inspection and modification to achieve the best test results. The tests generate a csv (Comma Separated Values) that can be used for importing into a spreadsheet for further data processing and graphing. There is also a script, postprocess.py, that can be used to get summary information, and/or averages based on rank or step. In the "unsupported" directory there is a cluster_scripts directory that may be useful if you use mpirun on a "White Box Cluster". Run notes ========= It may be useful to turn on the --check flag, but it will significantly increase the run time since it verifies the read matches the data written. The full test suite takes a considerable amount of time to run as specified in the the Scenarios document. Once you've verified that the rate, or IOPs per second, has reached a steady state, the test can be considered complete. To help you limit the time needed complete the tests, there are time limits available for the tests. The time limit is per phase, and there are generally several phases per test, and it doesn't include the initialization time, so the runs will be longer than the time requested. Data Collection =============== In addition to the output from the runs, it will be important to specify the systems under test, particularly what constitutes a "Scalable Storage Unit (SSU)" and how many SSUs were used in the test. Some of the parameters of the systems that should be collected are: systems ------- compute nodes mds mgs oss routers On each system gather the number of instances, as well as the system type, speed and latency numbers, as appropriate, for example: operating system cpu memeory bus interconnect/HSN components ---------- front-end storge network storage network switches back-end storage network disks On each additional component in the data path, gather number of items, type, speed and latency, and possibly the firmware version. raid controller --------------- On the "raid controller", note hardware, make and type, version, cache size, number and type of interfaces, and raid level. filesystems ---------- On the filesytems, note parallel filesytem type and version, and underlaying filesystem type and version, if appropriate, and versions of clients and servers, how big it is, and if it's newly formatted or aged.