Name | Modified | Size | Downloads / Week |
---|---|---|---|
Parent folder | |||
LEIAME_PT-BR | 2012-04-08 | 2.7 kB | |
README_EN | 2012-04-08 | 2.7 kB | |
process.tar.gz | 2012-04-08 | 180.7 kB | |
Totals: 3 Items | 186.1 kB | 0 |
TUTORIAL FOR USING EMPIRICAL PROCESS FOR SETTING EFFICIENT Hadoop REQUIREMENTS: * Apache Hadoop. Available at: http://hadoop.apache.org/common/releases.html * Cran R. Available at: http://cran.r-project.org/ - Libraries necessary R: - FRF2; - Rsm; * Support for Python DIRECTORIES AND FILES: To perform the activities of the process step-by-step (ad hoc) access the directories 'atv_x'. Otherwise, the execution will be done from the directory 'base' 1) Execution step-by-step: * Directory 'atv_1': run the script 'aggregate_atv1.sh' and define the metrics if interest. Default: response_time and throughput. * Directory 'atv_2': set the candidate parameters by adjusting their respective configuration files: 'par_core', 'par_hdfs' and 'par_mapred'. * Directory 'atv_3_4': run the script 'run_exp_ah.sh' to run the experiment, stating the experimental design: 1) Factorial Design (Activity 3) or 2) Response Surface (Activity 4) 2) Regular Execution (from 'base'): base _ |_ bin (binary files) |_ design_ | |_ [fd | rs]_ (types of experimental design) | |_ config_[fd | rs] (it stores the variables in the experiment) | |_ files (it stores the Hadoop configuration parameters) | |_ results (it stores the measurements of the experiments) | |_ treats (it stores the experimental treatments) | |_ metrics (scripts relating to metrics of interest) |_ reports (results and reports of the experiments) * Directory 'bin/apps': directory containing the scripts of application and possible workload generators. Default: 'terasort.sh' (Terasort) and 'pre_proc.sh' (TeraSort workload generator); * Directory 'metrics': copie todos os arquivos do metrics.* para este diretório (default: response_time and throughput) (Activity 1); * Directory 'design/[fd | rs]/files': defines the parameters that will be used for the Factorial Design (subdirectory 'fd') and Response Surface (subdirectory 'rs'). For any project, the parameters are adjusted through the subdirectory 'files'. Afterwards, the file config_[fd | rs] must to be adjusted. * File 'design/[fd | rs]/config_[fd | rs]': configure the experiments variables in the following order (one variable per line): - Name of the application; - Name of the metric of interest; - k: number of factors; - p: variable used for fractional experimental designs; - r: degree of replication of experiments; * File "bin/run_exp.sh ': script to run the experiments;