MiDataSets for MiBench (GPL license)
Developers:
Grigori Fursin (*), http://fursin.net/research
John Cavazos (**), http://homepages.inf.ed.ac.uk/jcavazos
Michael O'Boyle (**), http://www.dcs.ed.ac.uk/~mob
Olivier Temam (*), http://www.lri.fr/~temam
(*) INRIA Futurs, France
(**) University of Edinburgh, UK
Started in February, 2006
Development website:
http://midatasets.sourceforge.net
Remarks:
Though we made an effort to include only copyright free datasets
from the Internet, mistakes are possible. In such cases, please
contact Grigori Fursin (grigori.fursin@inria.fr) as soon as possible
and we will try to resolve the issue.
********************************************************************************
Iterative optimization is now a popular technique to obtain performance
or code size improvements over the default settings in a compiler. However,
in most of the research projects, the best configuration is found for one
arbitrary dataset and it is assumed that this configuration will work well
with any other dataset that a program uses. We created 20 different datasets
per program for free MiBench benchmark (http://www.eecs.umich.edu/mibench)
to evaluate this assumption and analyze the behavior of various programs
with multiple datasets. We hope that this will enable more realistic
benchmarking and practical iterative optimizations.
This work has been presented at HiPEAC'07:
Grigori Fursin, John Cavazos, Michael O'Boyle and Olivier Temam.
MiDataSets: Creating The Conditions For A More Realistic Evaluation
of Iterative Optimization. Proceedings of the International Conference
on High Performance Embedded Architectures Compilers (HiPEAC 2007),
Ghent, Belgium, January 2007
********************************************************************************
Datasets:
automotive_qsort_data
20 datasets, random numbers, different size
automotive_susan_data
20 datasets, pnm images, different size, different scenery
consumer_data
20 datasets, mp3 audio, different size, different bit-rate, different genres
20 datasets, wav audio converted from original mp3 datasets
consumer_jpeg_data
20 datasets, jpeg images, different size, different scenery
20 datasets, ppm images converted from original jpeg datasets
consumer_tiff_data
30 datasets, tiff images converted from original jpeg datasets
30 datasets, b&w tiff images converted from original jpeg datasets
30 datasets, tiff images without compression converted from original jpeg datasets
network_dijkstra_data
20 datasets, random numbers, random size
network_patricia_data
20 datasets, random numbers, random size
office_data
20 datasets, text files, different size, different genres
20 datasets, ps converted from original text datasets
20 datasets, pgp converted from original text datasets
20 datasets, enc converted from original text datasets
20 datasets, benc converted from original text datasets
20 datasets, text small files with random words in each line, different size
telecom_data
20 datasets, pcm audio converted from mp3 datasets
20 datasets, adpcm audio converted from mp3 datasets
telecom_gsm_data
20 datasets, au audio converted from mp3 datasets
20 datasets, gsm audio converted from mp3 datasets
********************************************************************************
Most of the source codes have been slightly modified by Grigori Fursin
to simplify and automate iterative optimizations. A loop wrapper has been
added around the main procedure to make some benchmarks run longer when
real execution time is used for measurements instead of a simulator
(we do not yet take into account cache effects - it's a future work).
Each directory has 3 Makefiles for GCC, Intel compilers and PathScale compilers.
Each directory has a "__run" batch file to execute a benchmark. The first
parameter is the dataset number and the second optional parameter is the
upper bound of the loop wrapper around the main procedure.
If second parameter is omitted, the loop wrapper upper bound
is taken from the file _run/_finfo_dataset.<dataset_number>.
Several batch files are included as examples to automate iterative optimizations
all__create_work_dirs - creates temporal work directories for each benchmark
all__delete_work_dirs - delete all temporal work directories
all_compile - compile all benchmarks in the temporal work directories
all_run - run all benchmarks with all datasets in the temporal work directories