I am currently adding Opt4J support to SESSL, a Scala DSL to specify
simulation experiments (http://sessl.org).
The Opt4J tutorial is quite useful, but it does not include much
details on parallelization.
I found the class ParallelIndividualCompleter and tried to bind it to
IndividualCompleter, like this:
new EvolutionaryAlgorithmModule() {
@Override
public void config() {
super.config();
bindConstant("maxThreads", ParallelIndividualCompleter.class).to(8);
bind(IndividualCompleter.class).to( ParallelIndividualCompleter.class);
}
}
which works in a sense (multiple individuals are shown as being
evaluated at the same time by the GUI, see screenshot), but it still
runs only sequentially if you check CPU usage ---- so this must be
wrong.
Any thoughts, or maybe a pointer to a setup where parallelization works?
I'm pretty sure I can find a way around the problem myself, I just
don't know where to start.
Best regards,
Roland
Last edit: Roland Ewald 2013-05-16
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
thanks for your question. The tutorial tries to give an overview. But we will think about adding some details on parallelization.
First, you do not need to create an own module for the ParallelIndividualCompleter. It is already there (see attached screenshot). It is sufficient to add to the rest of your modules.
In fact, the parallel completer performs only the completion of individuals in parallel, i.e., the decoding and evaluation. If you have very computational fast evaluators like, e.g., the optimization benchmarks shipped with Opt4J where only a small mathematical function is called (e.g., summing of all one in LOTZ), then the overhead of doing this in parallel as well as the overhead of the rest of the framework (optimizer, archive, GUI, ...) exceed the benefit.
However, as soon as you have slightly more complex evaluators, the parallel completion is beneficial.
For example, see the following (nonsense) increase of the comutational complexity of the DTLZEvaluator (just adding a for-loop):
@Override
public Objectives evaluate(DoubleString x) {
Objectives obj = null;
for (int j = 0; j < 10000; j++) {
final double g = g(x.subList(m - 1, n));
final List<Double> f = f(x.subList(0, m - 1), g);
obj = new Objectives();
for (int i = 0; i < objectives.size(); i++) {
Objective objective = objectives.get(i);
obj.add(objective, f.get(i));
}
}
return obj;
}
Without it, the average workload on one CPU is ~13% (8 core Intel Core i7-3770) no matter if I select Sequential- or ParallelIndiviualCompleter. However, with the above utilization increase, the average workload is with the SequentialIndiviualCompleter stays at ~120% while the ParallelIndiviualCompleter achieves a workload of 93%.
Especially, if you perform simulation experiments in the evaluators, you should see a similar performance increase. However, the performance of the overall optimization can be influenced be several other features, like:
heapspace of the JVM: java -Xmx4g
synchronized methods in the evaluators
avoid @Singleton at the evaluator
selected optimization algorithm (simple rule: do not choose SMS-EMOA - the default evolutionary algorithm (=NSGA2) you have chosen as it seems is fine)
Sorry, my screenshot has probably been misleading: I'm configuring Opt4J via its API and only use the viewer to show intermediate results (or for debugging), by adding a ViewerModule to the arguments of org.opt4j.core.start.Task#init(...).
Thus, I have to configure the ParallelIndividualCompleter programmatically.
Anyhow, you still have been right: the parallelization works, and the problem was in my code (a synchronized method in the Evaluator), sorry for bothering you!
I am now configuring parallel execution as done in org.opt4j.core.common.completer.IndividualCompleterModule#configure():
Dear Roland,
just a minor remark:
If you use org.opt4j.core.start.Task#init then you should also be able to use the existing module org.opt4j.common.completer.IndividualCompleterModule simply by adding it to your Task#init method.
EvolutionaryAlgorithmModule ea = new EvolutionaryAlgorithmModule(); //your module1
DTLZModule dtlz = new DTLZModule(); //your module2
dtlz.setFunction(DTLZModule.Function.DTLZ1);
ViewerModule viewer = new ViewerModule(); //your module3
IndividualCompleterModule comp = new IndividualCompleterModule(); //here comes the new module (the order does not matter)
comp.setType(PARALLEL);
Opt4JTask task = new Opt4JTask(false);
task.init(ea,dtlz,viewer,comp);
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Dear all,
I am currently adding Opt4J support to SESSL, a Scala DSL to specify
simulation experiments (http://sessl.org).
The Opt4J tutorial is quite useful, but it does not include much
details on parallelization.
I found the class ParallelIndividualCompleter and tried to bind it to
IndividualCompleter, like this:
which works in a sense (multiple individuals are shown as being
evaluated at the same time by the GUI, see screenshot), but it still
runs only sequentially if you check CPU usage ---- so this must be
wrong.
Any thoughts, or maybe a pointer to a setup where parallelization works?
I'm pretty sure I can find a way around the problem myself, I just
don't know where to start.
Best regards,
Roland
Last edit: Roland Ewald 2013-05-16
Dear Roland,
thanks for your question. The tutorial tries to give an overview. But we will think about adding some details on parallelization.
First, you do not need to create an own module for the ParallelIndividualCompleter. It is already there (see attached screenshot). It is sufficient to add to the rest of your modules.
In fact, the parallel completer performs only the completion of individuals in parallel, i.e., the decoding and evaluation. If you have very computational fast evaluators like, e.g., the optimization benchmarks shipped with Opt4J where only a small mathematical function is called (e.g., summing of all one in LOTZ), then the overhead of doing this in parallel as well as the overhead of the rest of the framework (optimizer, archive, GUI, ...) exceed the benefit.
However, as soon as you have slightly more complex evaluators, the parallel completion is beneficial.
For example, see the following (nonsense) increase of the comutational complexity of the DTLZEvaluator (just adding a for-loop):
Without it, the average workload on one CPU is ~13% (8 core Intel Core i7-3770) no matter if I select Sequential- or ParallelIndiviualCompleter. However, with the above utilization increase, the average workload is with the SequentialIndiviualCompleter stays at ~120% while the ParallelIndiviualCompleter achieves a workload of 93%.
Especially, if you perform simulation experiments in the evaluators, you should see a similar performance increase. However, the performance of the overall optimization can be influenced be several other features, like:
Hope I could help a little bit
Felix
Dear Felix,
thanks for your quick response!
Sorry, my screenshot has probably been misleading: I'm configuring Opt4J via its API and only use the viewer to show intermediate results (or for debugging), by adding a ViewerModule to the arguments of org.opt4j.core.start.Task#init(...).
Thus, I have to configure the ParallelIndividualCompleter programmatically.
Anyhow, you still have been right: the parallelization works, and the problem was in my code (a synchronized method in the Evaluator), sorry for bothering you!
I am now configuring parallel execution as done in org.opt4j.core.common.completer.IndividualCompleterModule#configure():
which seems to work just fine. Again, many thanks for the help!
Best regards,
Roland
Dear Roland,
just a minor remark:
If you use org.opt4j.core.start.Task#init then you should also be able to use the existing module org.opt4j.common.completer.IndividualCompleterModule simply by adding it to your Task#init method.
Enhancing the tutorial:
Dear Felix,
thanks, I haven't thought of that. Works like a charm (and helps to separate concerns)! :-)
Best regards,
Roland