CERRLA is the name given to the algorithm developed by me, Samuel Sarjant, at the University of Waikato for my PhD research. CERRLA stands for Cross-Entropy Relational Reinforcement Learning Agent. The full details can be found in my thesis titled: Policy Search Based Relational Reinforcement Learning using the Cross-Entropy Method.
There should not be any installation required, other than unpacking the files and running a java command. However, the algorithm does require jess.jar which contains the code for the JESS Rules Engine that is used to reason through the environments. Note that there are multiple zip files, but only the first one is required to run the algorithm on the Ms. Pac-Man and Blocks World environments:
To run CERRLA in the latter three environments, unzip them into their respective folders and include their compiled class and jar files in the classpath.
Running CERRLA can be as easy as running one of the .bat files (or modify it slightly in non-Windows OS). These files simply specify the java command to run the program, taking in a number of extra arguments (see [Arguments]).
CERRLA is initialised with the same main class, regardless of environment, and it loads the appropriate environment using a environment configuration file (and uses reflection on the standard naming convention of environment files). These configuration files are named *Arguments.txt, where * roughly corresponds to the environment.
Each environment config file contains (line-by-line):
1 The package name followed by an initial naming convention for the environment interface classes (this includes StateSpec and Environment, where each class is a subclass of StateSpec and RelationalEnvironment respectively).
+ The number of experiments to perform. This can be a single number, or a range in the form X-Y. Note that the experiment number is also the seed of the random number generator, so experiments are identical using identical experiment numbers.
+ The episode limit. If -1, there is no limit and the algorithm runs until converged (this can take a LONG time in some environments). Otherwise, the algorithm performs exactly X learning episodes (+/- 2 episodes for fair testing of policy samples.)
+ (Optional) Elites output file and generator performance output file (on separate lines). If these are not specified, CERRLA creates new unique files with the experiment start date, the environment name, environment goal, and any extra parameters for file identification.
+ Additional environmental parameters, separated by commas. These typically specify the goal and other arguments.
If successful, the experiment should begin and output should appear on the console (and a GUI window of the environment may also appear if applicable). To tweak how the experiment is run, the java command can take runtime [Arguments]. CERRLA also produces many [Outputs].