The µGP³ distributions ships with a set of Python scripts that allows to assess the performance of different versions of µGP over a given test, or to compare different settings of a same test given a version of µGP³. These scripts can be found in the Benchmarks
directory:
runbench.py
: to run several versions of µGP³ on several test cases, using predefined random seeds, and store the results. This document explains how to set up a benchmark using this script.compare_statistics.py
: draw graphs that compare the evolution of various internal parameters, using the results generated by the previous script.graph_statistics.py
: quickly plot the internal parameters of a single µGP³ run, using the resulting statistics.csv file.These scripts need the following Python packages to be installed:
runbench.py
also require the portalocker.py
script that can be found in the same directory.
A benchmark can be defined in a YAML file , using the same format as the following example (example file name: bench.yaml
)
# Directories of the tests to use for the benchmark, and all important files
tests:
lampade_lua:
files: [eval.lua, compat_eval.lua, parameters.txt, constraints.xml,
population.settings.xml, ugp3.settings.xml]
string_coverage_lua:
precmd: "perl string-coverage.generator.pl $seed"
files: [eval.lua, compat_eval.lua, testSet.txt,
string-coverage.constraints.xml, string-coverage.population.xml,
ugp3.settings.xml]
# Names of the MicroGP executables to run
ugp_executables:
- ./ugp3_trunk
- ./ugp3_camellia
# Random seeds to use
seeds: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]
# Whether to show the full ugp output or only the generations
full_ugp_output: true
The names of the tests (here: lampade_lua
and string_coverage_lua
) are directories that runbench.py
expects to find in the same directory as the YAML file. The same holds true for the listed names of µGP³ executables. Each executable will be run against each test.
The given list of seeds will be used to seed the random number generator of µGP, in order to make each run different. The number of seeds in this list also defines the number of runs of each executable/test pair. Therefore, this file will perform 2 * 2 * 20 runs, and will store the results into 2 * 2 subfolders of the Results
directory.
Finally, the tool can reproduce the full µGP output on the screen, or filter it to show only generation numbers. In this case, only the generations are shown.
Before running the benchmark, you must create a Results
folder in the same folder as the YAML file. The tool will put all the statistics there. Also, before running any test, the tool will copy the test files to a temporary directory that you must specify on the command line. To speed up the run, you can specify a directory that is mounted in memory. In this example we use /tmp
.
When everything is ready, you can start the benchmark by opening a terminal and running:
$ path/to/runbench.py my_benchmark/bench.yaml /tmp
The script will always check the configuration before starting µGP³, so if you start seeing the output it means that everything is ok.
Every (executable, test) pair creates a folder inside Results
, and each seed produces a statistics file (<seed>.csv
) inside this folder. The script also stores a verbose log for each seed (<seed>_verbose.log
) which you can investigate later in case of strange results.
If you have some processor time left, you can open other terminals and run the previous command several times in parallel: the script will run each configuration (executable, test, seed) only once.
You can also interrupt the script and start it again later: it will restart the latest (executable, test, seed) combination that it was working on.
If you change any of your test files, the script will detect it and run again all the combinations involving the modified test, and store the results in a new subfolder of Results
.
When runbench.py
returns (maybe in several hours), you can use the compare_statistics.py
script to compare graphically the results of several (executable, test) combinations. Look into the Results
directory and find the directory names of the combinations you want to compare (for example: lampade_lua_ugp3_trunk_2014-12-01_15:31:19
and lampade_lua_ugp3_camellia_2014-12-01_15:12:34
). Open a terminal and run:
$ path/to/compare_statistics.py Results/lampade_lua_ugp3_trunk_2014-12-01_15:31:19 Results/lampade_lua_ugp3_camellia_2014-12-01_15:12:34
The script will read all the .csv
files and compute the means and standard deviations for each internal parameter of each given result directory. This will take some seconds, depending on the number of generations of your evolutionary runs.
When this precomputation step is complete, the script will present a prompt in which you can query graphs. The four functions to use are:
opuse()
: will display a stacked histogram of operator usage with respect to the generation number.filter
argument, as in opuse(filter='scan')
to plot only scan operators.group
parameter, e.g. opuse(group=100)
will show one bar for each period of 100 generations.show('parameter name')
: will display two graphs of the specified parameter, by default with respect to generation numbers.x='eval'
parameter.boxplot('parameter name')
: displays a box plot of the given parameter after the maximum common number of generations.pos
, e.g. boxplot('Best', pos=1000)
makes a boxplot of the best fitness value at generation 1000.x='eval'
.boxplot_convergence('parameter name', <threshold>)
: displays a boxplot of the first generation at which the given parameter reaches the given threshold (supposed monotonically increasing).
y='eval'
. E.g. boxplot_convergence('Best', 0.9, y='eval')
will compare the number of evaluations required for the best fitness value to reach or exceed 0.9.decreasing=True
. For example, to compare the convergence speed you can call show('Best')
which will display the evolution of all fitness components of the best individual and/or group.
When you are done, you can close the script and all graphs by typing quit
.
runbench.py
won't save the actual results of the algorithm, only the statistics, in the Results
folder. The actual results (best individual/group) are left in the temporary directory and discarded at the end of the run.compare_statistics.py
will behave badly on runs that involve several populations. Also, the graphs are not yet perfectly beautiful. Guidelines for beautiful graphs: https://github.com/jbmouret/matplotlib_for_papers