Menu

Benchmarking

Jany Belluz

How to compare various versions of µGP³ on various tests

The µGP³ distributions ships with a set of Python scripts that allows to assess the performance of different versions of µGP over a given test, or to compare different settings of a same test given a version of µGP³. These scripts can be found in the Benchmarks directory:

  • runbench.py: to run several versions of µGP³ on several test cases, using predefined random seeds, and store the results. This document explains how to set up a benchmark using this script.
  • compare_statistics.py: draw graphs that compare the evolution of various internal parameters, using the results generated by the previous script.
  • graph_statistics.py: quickly plot the internal parameters of a single µGP³ run, using the resulting statistics.csv file.

Install Python dependencies

These scripts need the following Python packages to be installed:

  • pandas
  • matplotlib
  • yaml
  • numpy
  • IPython

runbench.py also require the portalocker.py script that can be found in the same directory.

Define a benchmark

A benchmark can be defined in a YAML file , using the same format as the following example (example file name: bench.yaml)

# Directories of the tests to use for the benchmark, and all important files
tests:
  lampade_lua:
    files: [eval.lua, compat_eval.lua, parameters.txt, constraints.xml,
        population.settings.xml, ugp3.settings.xml]
  string_coverage_lua:
    precmd: "perl string-coverage.generator.pl $seed"
    files: [eval.lua, compat_eval.lua, testSet.txt,
        string-coverage.constraints.xml, string-coverage.population.xml,
        ugp3.settings.xml]

# Names of the MicroGP executables to run
ugp_executables:
  - ./ugp3_trunk
  - ./ugp3_camellia

# Random seeds to use
seeds: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]

# Whether to show the full ugp output or only the generations
full_ugp_output: true

The names of the tests (here: lampade_lua and string_coverage_lua) are directories that runbench.py expects to find in the same directory as the YAML file. The same holds true for the listed names of µGP³ executables. Each executable will be run against each test.

The given list of seeds will be used to seed the random number generator of µGP, in order to make each run different. The number of seeds in this list also defines the number of runs of each executable/test pair. Therefore, this file will perform 2 * 2 * 20 runs, and will store the results into 2 * 2 subfolders of the Results directory.

Finally, the tool can reproduce the full µGP output on the screen, or filter it to show only generation numbers. In this case, only the generations are shown.

Run the benchmark

Before running the benchmark, you must create a Results folder in the same folder as the YAML file. The tool will put all the statistics there. Also, before running any test, the tool will copy the test files to a temporary directory that you must specify on the command line. To speed up the run, you can specify a directory that is mounted in memory. In this example we use /tmp.

When everything is ready, you can start the benchmark by opening a terminal and running:

$ path/to/runbench.py my_benchmark/bench.yaml /tmp

The script will always check the configuration before starting µGP³, so if you start seeing the output it means that everything is ok.

Every (executable, test) pair creates a folder inside Results, and each seed produces a statistics file (<seed>.csv) inside this folder. The script also stores a verbose log for each seed (<seed>_verbose.log) which you can investigate later in case of strange results.

If you have some processor time left, you can open other terminals and run the previous command several times in parallel: the script will run each configuration (executable, test, seed) only once.

You can also interrupt the script and start it again later: it will restart the latest (executable, test, seed) combination that it was working on.

If you change any of your test files, the script will detect it and run again all the combinations involving the modified test, and store the results in a new subfolder of Results.

Examine the results

When runbench.py returns (maybe in several hours), you can use the compare_statistics.py script to compare graphically the results of several (executable, test) combinations. Look into the Results directory and find the directory names of the combinations you want to compare (for example: lampade_lua_ugp3_trunk_2014-12-01_15:31:19 and lampade_lua_ugp3_camellia_2014-12-01_15:12:34). Open a terminal and run:

$ path/to/compare_statistics.py Results/lampade_lua_ugp3_trunk_2014-12-01_15:31:19 Results/lampade_lua_ugp3_camellia_2014-12-01_15:12:34

The script will read all the .csv files and compute the means and standard deviations for each internal parameter of each given result directory. This will take some seconds, depending on the number of generations of your evolutionary runs.

When this precomputation step is complete, the script will present a prompt in which you can query graphs. The four functions to use are:

  • opuse(): will display a stacked histogram of operator usage with respect to the generation number.
    • You can filter the displayed operators by passing a filter argument, as in opuse(filter='scan') to plot only scan operators.
    • If the graphs take too long to appear on the screen, you can subsample the histogram by giving a group parameter, e.g. opuse(group=100) will show one bar for each period of 100 generations.
  • show('parameter name'): will display two graphs of the specified parameter, by default with respect to generation numbers.
    • You can give only part of the parameter name: if several parameters match, these two graphs will be shown for all matching parameters.
    • To plot wrt. the number of evaluations, you can specify the x='eval' parameter.
    • By default each line has error bars. You can show several other kinds of graphs: only the mean, or the min/max, or functional boxplots (does not work very well). See the help message printed by the tool.
  • boxplot('parameter name'): displays a box plot of the given parameter after the maximum common number of generations.
    • To plot at a specific number of generations, pass the argument pos, e.g. boxplot('Best', pos=1000) makes a boxplot of the best fitness value at generation 1000.
    • To use evaluation counts, use x='eval'.
  • boxplot_convergence('parameter name', <threshold>): displays a boxplot of the first generation at which the given parameter reaches the given threshold (supposed monotonically increasing).

    • To get a boxplot of the first evaluation, pass y='eval'. E.g. boxplot_convergence('Best', 0.9, y='eval') will compare the number of evaluations required for the best fitness value to reach or exceed 0.9.
    • To consider a decreasing parameter, pass decreasing=True.

    For example, to compare the convergence speed you can call show('Best') which will display the evolution of all fitness components of the best individual and/or group.

When you are done, you can close the script and all graphs by typing quit.

Limitations

  • runbench.py won't save the actual results of the algorithm, only the statistics, in the Results folder. The actual results (best individual/group) are left in the temporary directory and discarded at the end of the run.
  • compare_statistics.py will behave badly on runs that involve several populations. Also, the graphs are not yet perfectly beautiful. Guidelines for beautiful graphs: https://github.com/jbmouret/matplotlib_for_papers

Related

Wiki: Changes introduced in Camellia
Wiki: Home

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.