# GSoC 2013

(Difference between revisions)

Line 21: Line 26: ==User Interface== ==User Interface== - ===Archipelago builder and inspector - difficulty 3=== + ==='''Interactive Cloud Optimisation via Google Talk (XMPP) - difficulty 4'''=== + [[File:Gtalk.png | 200px | left ]] + In order to increase PyGMO's usability and interactivity as well as improve user convenience and experience, we would like to extend PyGMO to be accessible through Google Talk interface. It will allow PyGMO users to turn their computer effectively into an optimisation server which can be controlled by sending commands via Google Talk and check on the progress of their jobs on remote computers or even mobile devices (i.e. Android phones and tablets). + +
+
1. + MOEA/D [5]: is a multiobjective optimization evolutionary algorithm based on decomposition. It decomposes a multiobjective optimization problem into a set of simple problems and then solves them in a collaborative way. +
2. Multi-Objective Particle Swarm Optimization [1]: it uses an external archive and a Pareto ranking schema to evaluate dominance between solutions and a geographically based approach to maintain the diversity.
3. Multi-Objective Particle Swarm Optimization [1]: it uses an external archive and a Pareto ranking schema to evaluate dominance between solutions and a geographically based approach to maintain the diversity. -
4. +
5. Niched Pareto Genetic Algorithm [2]: the tournament selection into the single-objective genetic algorithm is modified into Pareto dominance tournament. Two individuals randomly chosen are compared against a subset from the entire population, if one is dominated and the other one no, this one is chosen for reproduction, if there is a tie the result of the tournament is decided through a fitness sharing technique that maintains genetic diversity and favor the spread of the solutions along the Pareto front. Niched Pareto Genetic Algorithm [2]: the tournament selection into the single-objective genetic algorithm is modified into Pareto dominance tournament. Two individuals randomly chosen are compared against a subset from the entire population, if one is dominated and the other one no, this one is chosen for reproduction, if there is a tie the result of the tournament is decided through a fitness sharing technique that maintains genetic diversity and favor the spread of the solutions along the Pareto front. Line 191: Line 223: The task of the student is: The task of the student is: - * Implement the four multi-objective optimization techniques referring to the algorithms provided in the literature. + * Implement as many multi-objective optimization techniques from the list above as possible referring to the algorithms provided in the literature. - * Test the four algorithms on a set of multi-objective benchmark problems already available in the PaGMO/PyGMO framework. + * Test the implemented algorithms on the set of multi-objective benchmark problems already available in the PaGMO/PyGMO framework and possibly expand that suite. - '''Implementation Details''': Extend for each algorithm the base class available in Python or C++ ([http://pagmo.sourceforge.net/pagmo/algorithm_2base_8h_source.html algorithm/base.h] [http://pagmo.sourceforge.net/pagmo/algorithm_2base_8cpp_source.html algorithm/base.cpp]). The NSGA2 ([http://pagmo.sourceforge.net/pagmo/nsga2_8h_source.html nsga2.h] [http://pagmo.sourceforge.net/pagmo/nsga2_8cpp_source.html nsga2.cpp]) algorithm, the only Multi-objective Optimization strategy currently available, can be used as guideline. The method ''evolve'' of the class, that takes as argument an instance of the population, shall contain all the routines of the specific algorithms. + '''Implementation Details''': For each algorithm create a class that inherits from the base class pagmo::algorithm::base ([http://pagmo.sourceforge.net/pagmo/algorithm_2base_8h_source.html algorithm/base.h] [http://pagmo.sourceforge.net/pagmo/algorithm_2base_8cpp_source.html algorithm/base.cpp]). The NSGA2 ([http://pagmo.sourceforge.net/pagmo/nsga2_8h_source.html nsga2.h] [http://pagmo.sourceforge.net/pagmo/nsga2_8cpp_source.html nsga2.cpp]) algorithm, the only Multi-objective Optimization strategy currently available, can be used as guideline. The method ''evolve'' of the class, that takes as argument an instance of the population, contains most of the juice of specific algorithms. ''References'' ''References'' Line 205: Line 237: [4] Hernandez-Dıaz A.G., Santana-Quintero L.V., Coello Coello C.A., Molina J., [http://www.mitpressjournals.org/doi/pdf/10.1162/evco.2007.15.4.493 Pareto-Adaptive $\epsilon$-Dominance], Evolutionary Computation , Vol. 15(4), pp 493-517, 2007. [4] Hernandez-Dıaz A.G., Santana-Quintero L.V., Coello Coello C.A., Molina J., [http://www.mitpressjournals.org/doi/pdf/10.1162/evco.2007.15.4.493 Pareto-Adaptive $\epsilon$-Dominance], Evolutionary Computation , Vol. 15(4), pp 493-517, 2007. + + [5] Q. Zhang and H. Li, MOEA/D: A Multi-objective Evolutionary Algorithm Based on Decomposition, IEEE Trans. on Evolutionary Computation, vol.11, no. 6, pp712-731 2007 + '''Skills Required''': Good knowledge in C++ and Python '''Skills Required''': Good knowledge in C++ and Python Line 212: Line 247: '''Mentors''': [[Mentors#Lisa | Annalisa Riccardi]] and [[Mentors#Marcus | Marcus Märtens]] '''Mentors''': [[Mentors#Lisa | Annalisa Riccardi]] and [[Mentors#Marcus | Marcus Märtens]] - + ==='''Hyperfast hypervolumes for Multi-objective optimization - difficulty 4.5'''=== - ===Hyperfast hypervolumes for Multi-objective optimization - difficulty 4 === + [[File:Hypercube.gif | left ]] [[File:Hypercube.gif | left ]] - The hypervolume indicator is one of the most popular performance measures in multi-objective optimization. It measures the volume of the fitness space which is dominated by a set of solutions and is the only unary single set quality measure that is known to be strictly monotonic with regard to Pareto dominance (That is, the Pareto-optimal front guarantees the maximum possible hypervolume while any worse set will assign a lower hypervolume). + Multi-objective optimizers usually do not search for the single best solution of a problem, because there is not such a thing if you have more than 1 objective. Instead, they create a set of solutions that represent the best-possible trade-offs between all involved objectives. These solution-sets are called Pareto-approximations, as they try to approximate the Pareto-front of the problem. + + While a comparison of two solutions with respect to a single objective is straight-forward (take the better one), assigning a quality to a whole Pareto-approximation is a difficult matter. The hypervolume indicator can be used to assign a quality to a Pareto-approximation, defined by the volume of the fitness space that is dominated by the Pareto-approximation. It is the only unary single set quality measure that is known to be strictly monotonic with regard to Pareto dominance (That is, the Pareto-optimal front guarantees the maximum possible hypervolume while any worse set will assign a lower hypervolume). This is nice, since now we can compare different solution sets and take the one with the best hypervolume. - While being computationally #P-hard [1] most exact computations scale exponentially in the number of objectives. However, recent advances in the research on multi-objective optimization provide fast approximations on a reasonable precision that will allow the design of efficient evolutionary algorithms using the hypervolume as a mechanism for selection, diversification or archiving. The goal of this project is to enhance the multi-objective capabilities of PaGMO by concepts based on the hypervolume. The tasks of the student consist of + As determining the exact hypervolume of a number of points is computationally #P-hard [1], most exact computations unfortunately scale exponentially in the number of objectives. However, recent advances in the research on multi-objective optimization provide fast approximations on a reasonable precision that will allow not only to use the hypervolume to judge the quality of an algorithm, but also to design efficient evolutionary algorithms based on the hypervolume as a mechanism for selection, diversification or archiving. Thus, the hypervolume is an extremely versatile concept and would significantly level up PaGMOs capabilities for multi-objective optimization! We suppose, the students tasks should consist of

- * Adding exact and approximative methods for the computation of the hypervolume (like the ones from [1], [2], [3], [5], [6]) + * Adding exact and approximative methods for the computation of the hypervolume (candidates could be from [1], [2], [3], [5], [6]) - * Testing the performance and quality of the approximations for varying parameters on the multi-objective problems of PaGMO + * Testing the performance and quality of the approximations for varying parameters on the available multi-objective problems of PaGMO - * Adding one or more multi-objective algorithms based on the hypervolume indicator (i.e. [http://www.tik.ee.ethz.ch/sop/publicationListFiles/bz2008a.pdf Hype] [2]) + * Adding one or more multi-objective algorithms based on the hypervolume indicator (a candidate could be [http://www.tik.ee.ethz.ch/sop/publicationListFiles/bz2008a.pdf Hype] [2] or maybe SMS-EMOA [7]) * Implementing an efficient migration operator based on the contribution of single individuals to the hyper-volume (preferably based on [4]) * Implementing an efficient migration operator based on the contribution of single individuals to the hyper-volume (preferably based on [4]) - '''Implementation Details''': the population class needs to be enhanced by a method get_hypervolume. Given a set of reference points (a vector of fitness vectors) the method shall return a value reflecting the exact hypervolume of the population corresponding to the reference points. An optional keyword 'variant' shall be used to switch between the exact computation and different approximative methods. All algorithms using the hypervolume are supposed to use the functionalities of the population class and not compute the hypervolume by themselves. + '''Implementation Details''': the population class needs to be enhanced by a method get_hypervolume. Given a set of reference points (a vector of fitness vectors) the method shall return a value reflecting the exact hypervolume of the population corresponding to the reference points. An optional keyword 'variant' shall be used to switch between the exact computation and different approximative methods. All algorithms using the hypervolume are supposed to use the functionalities of the population class and not to compute the hypervolume by themselves! - A new replacement policy has to be added based on the exclusive hypervolume (i.e. the contribution of a single individual to the collective hypervolume). This replacement strategy should select the n members of the population which contribute the least to the hypervolume. Accordingly, a new selection policy shall send migrants with high contribution to the hypervolume in their population. + A new replacement policy has to be added based on the exclusive hypervolume (i.e. the contribution of a single individual to the collective hypervolume). This replacement strategy should select the n members of the population which contribute the least to the hypervolume. Accordingly, a new selection policy shall send migrants with high contribution to the hypervolume in the their home population to other populations. All functionalities need to be written in clean and documented C++ code and made available for the Python interface. Automated functional tests need to be added to PaGMO as well. All functionalities need to be written in clean and documented C++ code and made available for the Python interface. Automated functional tests need to be added to PaGMO as well. Line 247: Line 283: [6] Emmerich, Michael, and Carlos Fonseca. "Computing hypervolume contributions in low dimensions: asymptotically optimal algorithm and complexity results." Evolutionary Multi-Criterion Optimization. [6] Emmerich, Michael, and Carlos Fonseca. "Computing hypervolume contributions in low dimensions: asymptotically optimal algorithm and complexity results." Evolutionary Multi-Criterion Optimization. + + [7] Beume, Nicola, Boris Naujoks, and Michael Emmerich. "SMS-EMOA: Multiobjective selection based on dominated hypervolume." European Journal of Operational Research 181.3 (2007): 1653-1669. '''Skills Required''': Good knowledge in C++, Python '''Skills Required''': Good knowledge in C++, Python - '''Skills that help''': Thinking in more than 1 dimension, understanding of high dimensional geometries and not being afraid of recursion + '''Skills that help''': Thinking in more than 1 dimension, understanding of high dimensional geometries, the concept of Pareto-dominance and not being too afraid of recursion '''Mentors''': [[Mentors#Marcus | Marcus Märtens]] and [[Mentors#Luis | Luís F. Simões]] '''Mentors''': [[Mentors#Marcus | Marcus Märtens]] and [[Mentors#Luis | Luís F. Simões]] - ==Constraints Handling Techniques== ==Constraints Handling Techniques== - ===Constraints Handling Techniques in Evolutionary Algorithms - difficulty 5=== + ==='''Constraints Handling Techniques in Evolutionary Algorithms - difficulty 5'''=== [[File:cstrhandling.jpg | 200px | left]] [[File:cstrhandling.jpg | 200px | left]]

## Current revision as of 11:53, 14 April 2013

These are some projects we propose for the 2013 application of PaGMO/PyGMO to the Google Summer of Code. In case PaGMO/PyGMO will be selected as a mentoring organization we will pick the projects to actually carry out according to the best applications received.

You can also submit your own idea on what part of PaGMO you would like to develop, in which case provide as many details as possible in your application so that we can look for the best possible mentorship.

Start to get familiar with the code by:

1. In Python: install the PyGMO module for your python2.7 (the port to python 3 is under development). Follow the online PyGMO tutorials
2. In C++: get familiar with the classes problem.base, algorithm.base and topology.base
3. Instantiate some archipelagos with homogeneous and heterogeneous algorithms
4. Analyze results from some optimization
5. Contact us via the IRC channel, mail or mailing-list as to ask for more details on what is in our mind for the particular project you are interested in.

Interaction with the PaGMO/PyGMO community before, during and after the actual project duration is one of the key element of success for all projects.

Check the PyGMO dedicated web site !!

IMPORTANT: we want to make science while having fun! No background is required for the development of the projects below, but programming. The mentors are all qualified scientific programmers/researchers competent on the scientific part of the project. They are willing to bring people into the science of optimization/evolutionary computation, doing their best to make PaGMO/PyGMO programming a rewarding experience also from the scientific point of view.

## User Interface

### Interactive Cloud Optimisation via Google Talk (XMPP) - difficulty 4

In order to increase PyGMO's usability and interactivity as well as improve user convenience and experience, we would like to extend PyGMO to be accessible through Google Talk interface. It will allow PyGMO users to turn their computer effectively into an optimisation server which can be controlled by sending commands via Google Talk and check on the progress of their jobs on remote computers or even mobile devices (i.e. Android phones and tablets).

The student will be mentored to implement in Python a modular and extensible application that will provide as many as possible of the following functionalities:

• Create an interface to PyGMO via Google Talk to create an optimisation server on their own computers which could then be remotely accessible;
• Ability to execute the full set of existing PyGMO commands on the server from a remote client;
• Sending new problems/algorithms/topologies from a remote client either directly via text (meta-language or Python code) in the chat window or by uploading a description file via Google Talk;
• Displaying objects and data visually on an appropriate remote client (graphs, archipelago topologies, etc.);
• Implementing new methods that would allow a user to remotely check up on the progress and details of an optimisation process during run time;

Implementation Details: The student will be asked to implement the Google Talk communication functionality in Python and XMPP/Jabber.

The student is asked to write clean and well documented code.

Skills Required: Knowledge of Python and XMPP

Skills that help: Previous experience with web development and a good understanding of communication protocols

Mentors: Wiktor Piotrowski, Annalisa Riccardi

### Archipelago Builder and Inspector - difficulty 3

The standard PyGMO working cycle is to build an archipelago and run evolutions on it. This is usually done constructing islands and pushing them back into the archipelago. Then evolution starts. Then the optimized results are inspected. With this project we want to build a graphical interactive archipelago builder (GIABU). The user will be given the possibility to fire up GIABU and then select via simple gestures and menus the algorithm in each island, the migration paths and policies. He will be able to follow up the migrants and study what is working and what is not to then resume the evolution at the archipealgo level after having changed some of the islands, e.g. the algorithm or the selection policies etc.

The student will be mentored to implement in Python a modular and extensible graphical application that will provide as many as possible of the following functionalities:

• selection and tuning of optimization algorithms exposed in PyGMO;
• graphical configuration of the multiple sub-populations (islands) to undergo optimization. This includes setting up the topology of the archipelago, the multiple algorithms controlling each island, and the migrations between islands;
• possibility to generate a Python file with the sequence of PyGMO instructions needed to achieve the setup configured in the graphical application, either for loading back in future optimizations, or for the user to then manually extend the setup according to his/her needs;
• managing/monitoring of computing resources available through the network (visual interface to the MPI parallelization developed during a previous GSoC);
• plots updated in real-time with statistics from ongoing optimizations (at the level of the archipelago, but also of individual islands);
• visualization and inspection of solutions along the Pareto front in optimizations with two and three objectives;
• Parallel coordinates visualization showing relationships between performance and values in the multiple variables under optimization (ideally with a degree of interactivity similar to the one provided by parvis);

Implementation Details: The GUI will essentially serve to instantiate and explore interactively an object of the class PyGMO.archipelago. This is a rather complex object containing (essentially) a PyGMO.topology and a number of PyGMO.islands Each PyGMO.island contains a PyGMO.algorithm and a PyGMO.population. Each PyGMO.population contains a set of individuals and a PyGMO.problem. PyQT4 could be the underlying tool to achieve this goal, but other equivalent modules can also be proposed (such as wxWidgets and PyGTK; see the book Matplotlib for Python Developers for a good comparison between all of these).

A static, non-interactive visualization of archipelago topologies is already provided through the NetworkX library (see PyGMO/topology/__init__.py). The student may build on top of this structure, or reimplement a more suited visualization.

The student is asked to write clean and well documented code.

Skills Required: Knowledge of Python

Skills that help: An artistic sense for the beautiful & functional!

Mentors: Wiktor Piotrowski and Dario Izzo

### Dr.PaGMO/PyGMO Model Inspector - difficulty 3

Optimization problems are often provided as black box functions. A number of exploratory techniques based on sampling of the search space are available to help to gain knowledge about the problem [1] [2]. In particular we are interested about problem characteristics that can be critical for the performance of the algorithms or those that can help the user to reformulate the problem in a more efficient way. The student will be involved in the development of the initial version of the module. In particular he will develop the methodology for the analysis of:

• Degree of non linearity of the problem: study of the convexity of the function shape through sampling of the search space and statistical evaluation of its degree of nonlinearity, refers to [1], paragraph 2.1, for details.
• Box constraints hypervolume computation.
• Analysis of constraints redundancies: computation of the constraints effectiveness as a fraction of the sampled points that violates the constraint (equality constraints are treated with three indexes: violated per defect, violated per excess and satisfied within tolerances; the constraints is violated if one of the first two index is greater than zero and the others zero); see [1] for details.
• Analysis of design variables redundancies: heuristic sensitivity analysis on the effectiveness of the optimization variables on the objective function through derivative approximations of the objective function in the sampling point of the search space.
• Analysis of objectives redundancies: sampling of the search space, evaluation of the correlation matrix of the data matrix (MxN where M is the number of objectives and N the number of sample points) in the standardized form (see [3], section 5.2 for details)
• Disconnected feasible regions: evaluated statistically as in the case of the estimation of non linearities, distinguishing between transition from feasible to non feasible regions.
• Function range and level set: all the function evaluations performed during the previous analysis are collected to evaluate the range of the objective function and its distribution.

Implementation Details: the student will implement a standing alone module that requires as input only the problem definition (problem/base.cpp, problem/base.h). The module can be coded in C++ or Python and shall include a list of the tests that can be performed, a visualization of the results and the writing of a final report.

References

[1] John W. Chinneck, Discovering the Characteristics of Mathematical Programs via Sampling, Optimization Methods and Software, vol. 17 (2), pp. 319-352, 2002.

[2] Mersmann O., Bischl A.B., Trautmann H., Preuss M., Weihs C., Rudolph G., Exploratory landscape analysis, Proceedings of the 13th annual conference on Genetic and evolutionary computation, pp 829-836, 2011.

[3] Deb, K. and Saxena, D.K. On Finding Pareto-Optimal Solutions Through Dimensionality Reduction for Certain Large-Dimensional Multi-Objective Optimization Problems. KanGAL Report No. 2005011, IIT, Kanpur, 2005.

Skills Required: Good knowledge in C++ and Python

Skills that help: Having a problem solving attitude

Mentors: Annalisa Riccardi and Francisco Fernandez Navarro

## Benchmarking and statistical comparisons

### Racing population individuals and algorithms - difficulty 3.5

The technique commonly referred to as "racing" selection [1-3], is applicable in general to rank stochastic variables. It can be broadly described as a procedure that ranks by locating the 'bad ones' as soon as statistically sufficient evidence is gathered. Connected with sound statistical results, the Hoeffding races [1,2] or the Bernstein races [3], for example, are able to significantly reduce the number of evaluations needed to obtain a given confidence level in the ranking.

Two application targets are envisioned for racing methods in PaGMO/PyGMO:

• Racing the algorithms running in different islands [4,5,6]
• An optimizer's performance varies across problems. Having no prior knowledge of which optimizer will be better suited to a given problem, the user is forced to manually try out different optimizers, and different parameter settings within an optimizer. PaGMO greatly facilitates this task, by enabling the user to easily deploy a wide variety of optimizers over the same problem, but the task of further selection and tuning is still left to the user. By implementing racing methods, we aim to automate this process. The user should be able to deploy PaGMO's assorted optimizers on a given problem, and PaGMO should monitor their progress, dynamically allocating effort to the most successful ones, and stopping as soon as possible the execution of those that have statistically been found to be inferior on the present problem. Taking this work one step further, racing can be used not only to select the best optimizer for a given problem, but also for tuning it (racing between different parametrizations of an optimizer, leading to the identification of the highest performing configuration);
• Racing individuals in a population
• individuals in a stochastic optimization problem could be ranked by a racing method. The common practice of evaluating all individuals in a population over a whole training set of instances (such as over a set of possible input values for the Neural Network under optimization), to average out their performance, is needlessly expensive. The optimizer usually demands solely the capacity to distinguish good from bad solutions, and cares nothing for having the best possible estimates of their absolute performance. A racing method is able to allocate per individual the minimal number of evaluations that suffices to distinguish its performance level from that of other individuals it is racing with. A thorough implementation of population racing will accommodate itself to the optimizer's population dynamics: a tournament selection in a genetic algorithm should invoke a race only between the individuals in the tournament (and should those individuals later enter new tournaments/races, their previous evaluations should be remembered, and provide a basis for the new race), in Particle Swarm Optimization, using a ring topology, a particle should invoke a race among those particles that surround it in the ring, so as to determine which one is the best, and should therefore influence its own trajectory. In both these examples, racing is made to deliver greater performance still, by breaking down a population's evaluation in sets of smaller, overlapping, races.

The goal of this project is therefore to provide PaGMO users with the possibility to easily race individuals and algorithms, and with it deliver gains in computation time, and in abstraction of the specific internal details of the used optimizers.

Implementation Details:

1. The student will implement a C++ class called pagmo::algorithm::race able to rank different algorithms.
2. The student will implement, in the class pagmo::population, a method called race() which will work only if the pagmo::problem is stochastic (i.e. it inherits from pagmo::problem::base_stochastic). The method evaluates individuals of the population over different scenarios (input data instances, random number generator seeds that initialize a simulator on which the evaluation will take place, ...). As statistical evidence is collected, different individuals are gradually dropped from the race, until a satisfactory ranking can be delivered back to the optimizer. At the end, the raced individuals are left with a fitness value which is the average over the multiple runs made.
3. The student will imlement an algorithm that makes use of the method pop.race to effectively solve a stochastic optimization problem

All the work will then need to be exposed to Python accordingly.

Below is an example of how a basic API for population racing might look like:

pagmo::problem::rosenbrock prob(10);
pagmo::population pop(prob,20);
pop.race();
size_t idx0 = pop.get_best_idx();


References

[1] Maron, O., & Moore, A. W. (1994). Hoeffding Races: Accelerating Model Selection Search for Classification and Function Approximation. In G. T. & J. A. Jack D. Cowan (Ed.), Advances in Neural Information Processing Systems (Vol. 6, pp. 59–66). San Francisco, CA: Morgan Kaufmann.

[2] Maron, O., & Moore, A. W. (1997). The Racing Algorithm: Model Selection for Lazy Learners. Artificial Intelligence Review, 11(1-5), 193–225. doi:10.1023/A:1006556606079

[3] Mnih, V., Szepesvári, C., & Audibert, J.-Y. (2008). Empirical Bernstein stopping. Proceedings of the 25th international conference on Machine learning - ICML ’08 (pp. 672–679). New York, New York, USA: ACM Press. doi:10.1145/1390156.1390241

[4] Birattari, M., Stützle, T., Paquete, L., & Varrentrapp, K. (2002). A Racing Algorithm for Configuring Metaheuristics. GECCO ’02 Proceedings of the Genetic and Evolutionary Computation Conference (pp. 11–18). Morgan Kaufmann Publishers Inc.

[5] Yuan, B., & Gallagher, M. (2004). Statistical Racing Techniques for Improved Empirical Evaluation of Evolutionary Algorithms. In X. Yao, E. K. Burke, J. A. Lozano, J. Smith, J. J. Merelo-Guervós, J. A. Bullinaria, J. E. Rowe, et al. (Eds.), Parallel Problem Solving from Nature - PPSN VIII (pp. 172–181). Springer. doi:10.1007/978-3-540-30217-9_18

[6] Yuan, B., & Gallagher, M. (2007). Combining Meta-EAs and Racing for Difficult EA Parameter Tuning Tasks. In F. G. Lobo, C. F. Lima, & Z. Michalewicz (Eds.), Parameter Setting in Evolutionary Algorithms (pp. 121–142). Springer. doi:10.1007/978-3-540-69432-8_6

Skills Required: Knowledge of boost::math and boost::python libraries.

Skills that help: a competitive spirit

Mentors: Luís F. Simões, Dario Izzo

### Statistical Comparison of Algorithms using Non-Parametric Tests - difficulty 2.5

Over the last years, the machine learning and optimization community has become increasingly aware of the need for statistical validation of the published results. In these communities, a performance analysis of the results using a parametric statistical treatment could lead to mistaken conclusions, since a previous evaluation of the typical metrics in these communities generally results in rejecting the normality and the equality of the variance hypothesis. Furthermore, Dempsar [1] suggested that the independence condition was not truly verified in ten-fold cross validation (a portion of samples was used either for training or testing in different partitions).

Therefore, in order to determine the statistical significance of the rank differences observed for each algorithm in the different problems, a non parametric Friedman test should be carried out [1] with the ranking of the metric considered as the test variable.

With this test, we should reject or not the null-hypothesis stating that all algorithms perform equally in mean ranking of the metric considered. In the case that the Friedman test shows significant differences in the results, different post-hoc statistical analyses are required. For all these post-hoc tests, it is necessary to select the best performing algorithm (in terms of mean ranking) as the control method for comparison with the rest of the models. The post-hoc tests considered in this project are:

• The Bonferroni-Dunn test [1]
• The Hochberg’s test [2]
• The Holm test [2]

Implementation Details: The student will be mentored to develop a GUI to run a subset of algorithms and a subset of problems (selected by the user) and then, provide the statistical information generated by the above-mentioned non-parametric tests. Please note that the Friedman test is available in the scipy statistical package (www.scipy.org), and the remaining statistical tests are included in [3] (Java version)

References

[1] J. Demssar, Statistical comparisons of classifiers over multiple data sets, Journal of Machine Learning Research 7 (2006) 1-30

[2] Y. Hochberg, A. Tamhane, Multiple Comparison Procedures, John Wiley & Sons, 1987

[3] Salvador García, and Francisco Herrera.. An Extension on "Statistical Comparisons of Classifiers over Multiple Data Sets" for all Pairwise Comparisons. Journal of Machine

Skills Required: Good knowledge in Python and Statistics

Skills that help: Like playing with numbers

Mentors: Francisco Fernandez Navarro and Dario Izzo

## Multi-Objective Optimization

### Enrichment of the Multi-Objective Optimization Algorithms Set - difficulty 3

At the current state the PaGMO/PyGMO library is able to define and solve single and multi-objective optimization problems. The set of algorithms available to solve multi-objective problems is limited to the genetic algorithm NSGA2 and the Improved Harmony Search. Multi-objective optimization problems more often arise in practical optimization problems where the interested is not limited to minimize a single objective function while to achieve a trade-off between different objectives. In literature several stochastic multi-objective optimization techniques have been developed. The integration of some of them in the current PaGMO/PyGMO framework will be a further enrichment for the library. In particular the following algorithms have been selected for the different way of tackling the multi-objective nature of the problem:

1. MOEA/D [5]: is a multiobjective optimization evolutionary algorithm based on decomposition. It decomposes a multiobjective optimization problem into a set of simple problems and then solves them in a collaborative way.
2. Multi-Objective Particle Swarm Optimization [1]: it uses an external archive and a Pareto ranking schema to evaluate dominance between solutions and a geographically based approach to maintain the diversity.
3. Niched Pareto Genetic Algorithm [2]: the tournament selection into the single-objective genetic algorithm is modified into Pareto dominance tournament. Two individuals randomly chosen are compared against a subset from the entire population, if one is dominated and the other one no, this one is chosen for reproduction, if there is a tie the result of the tournament is decided through a fitness sharing technique that maintains genetic diversity and favor the spread of the solutions along the Pareto front.
4. Strength Pareto Evolutionary Algorithm [3]: it uses an external archive updated at each iteration, it contains the set of non dominated solutions. For each individual in the external set a strength index is computed (used as a fitness), which represents the percentage of individuals in the current population dominated by the solution in the archive.
5. Pareto Adaptive ε-Dominance [4]: meta-algorithm that implements an archive strategy based on adaptive grids along the Pareto front to maintain diversity and cover uniformly the front.

The task of the student is:

• Implement as many multi-objective optimization techniques from the list above as possible referring to the algorithms provided in the literature.
• Test the implemented algorithms on the set of multi-objective benchmark problems already available in the PaGMO/PyGMO framework and possibly expand that suite.

Implementation Details: For each algorithm create a class that inherits from the base class pagmo::algorithm::base (algorithm/base.h algorithm/base.cpp). The NSGA2 (nsga2.h nsga2.cpp) algorithm, the only Multi-objective Optimization strategy currently available, can be used as guideline. The method evolve of the class, that takes as argument an instance of the population, contains most of the juice of specific algorithms.

References

[1] Coello Coello C.A., Salazar Lechuga M. MOPSO : A Proposal for Multiple Objective Particle Swarm, in Congress on Evolutionary Computation, pp. 825-830. IEEE, 2002.

[2] Horn J., Nafploitis N., and Goldberg D. E., A Niched Pareto Genetic Algorithm for Multiobjective Optimization, in Proceedings of the First IEEE Conference on Evolutionary Computation, IEEE Press, pp. 82–87, 1994.

[3] Zitzler, E., Laumanns, M., & Thiele, L. SPEA2: Improving the Strength Pareto Evolutionary Algorithm. Technical Report 103, Computer Engineering and Networks Laboratory (TIK), Swiss Federal Institute of Technology (ETH). Zurich, Switzerland, 2001.

[4] Hernandez-Dıaz A.G., Santana-Quintero L.V., Coello Coello C.A., Molina J., Pareto-Adaptive ε-Dominance, Evolutionary Computation , Vol. 15(4), pp 493-517, 2007.

[5] Q. Zhang and H. Li, MOEA/D: A Multi-objective Evolutionary Algorithm Based on Decomposition, IEEE Trans. on Evolutionary Computation, vol.11, no. 6, pp712-731 2007

Skills Required: Good knowledge in C++ and Python

Skills that help: Have a good sense of direction

Mentors: Annalisa Riccardi and Marcus Märtens

### Hyperfast hypervolumes for Multi-objective optimization - difficulty 4.5

Multi-objective optimizers usually do not search for the single best solution of a problem, because there is not such a thing if you have more than 1 objective. Instead, they create a set of solutions that represent the best-possible trade-offs between all involved objectives. These solution-sets are called Pareto-approximations, as they try to approximate the Pareto-front of the problem.

While a comparison of two solutions with respect to a single objective is straight-forward (take the better one), assigning a quality to a whole Pareto-approximation is a difficult matter. The hypervolume indicator can be used to assign a quality to a Pareto-approximation, defined by the volume of the fitness space that is dominated by the Pareto-approximation. It is the only unary single set quality measure that is known to be strictly monotonic with regard to Pareto dominance (That is, the Pareto-optimal front guarantees the maximum possible hypervolume while any worse set will assign a lower hypervolume). This is nice, since now we can compare different solution sets and take the one with the best hypervolume.

As determining the exact hypervolume of a number of points is computationally #P-hard [1], most exact computations unfortunately scale exponentially in the number of objectives. However, recent advances in the research on multi-objective optimization provide fast approximations on a reasonable precision that will allow not only to use the hypervolume to judge the quality of an algorithm, but also to design efficient evolutionary algorithms based on the hypervolume as a mechanism for selection, diversification or archiving. Thus, the hypervolume is an extremely versatile concept and would significantly level up PaGMOs capabilities for multi-objective optimization! We suppose, the students tasks should consist of

• Adding exact and approximative methods for the computation of the hypervolume (candidates could be from [1], [2], [3], [5], [6])
• Testing the performance and quality of the approximations for varying parameters on the available multi-objective problems of PaGMO
• Adding one or more multi-objective algorithms based on the hypervolume indicator (a candidate could be Hype [2] or maybe SMS-EMOA [7])
• Implementing an efficient migration operator based on the contribution of single individuals to the hyper-volume (preferably based on [4])

Implementation Details: the population class needs to be enhanced by a method get_hypervolume. Given a set of reference points (a vector of fitness vectors) the method shall return a value reflecting the exact hypervolume of the population corresponding to the reference points. An optional keyword 'variant' shall be used to switch between the exact computation and different approximative methods. All algorithms using the hypervolume are supposed to use the functionalities of the population class and not to compute the hypervolume by themselves!

A new replacement policy has to be added based on the exclusive hypervolume (i.e. the contribution of a single individual to the collective hypervolume). This replacement strategy should select the n members of the population which contribute the least to the hypervolume. Accordingly, a new selection policy shall send migrants with high contribution to the hypervolume in the their home population to other populations.

All functionalities need to be written in clean and documented C++ code and made available for the Python interface. Automated functional tests need to be added to PaGMO as well.

References

[1] Bringmann, Karl, and Tobias Friedrich. "Approximating the least hypervolume contributor: NP-hard in general, but fast in practice." Theoretical Computer Science 425 (2012): 104-116. paper

[2] Bader, Johannes, and Eckart Zitzler. "HypE: An algorithm for fast hypervolume-based many-objective optimization." Evolutionary computation 19.1 (2011): 45-76. paper

[3] Fonseca, Carlos M., Luıs Paquete, and Manuel López-Ibánez. "An improved dimension-sweep algorithm for the hypervolume indicator." Evolutionary Computation, 2006. CEC 2006. IEEE Congress on. IEEE, 2006.

[4] Bradstreet, Lucas, Lyndon While, and Luigi Barone. "A fast incremental hypervolume algorithm." Evolutionary Computation, IEEE Transactions on 12.6 (2008): 714-723. paper

[5] Guerreiro, Andreia P., Carlos M. Fonseca, and Michael TM Emmerich. "A Fast Dimension-Sweep Algorithm for the Hypervolume Indicator in Four Dimensions." paper

[6] Emmerich, Michael, and Carlos Fonseca. "Computing hypervolume contributions in low dimensions: asymptotically optimal algorithm and complexity results." Evolutionary Multi-Criterion Optimization.

[7] Beume, Nicola, Boris Naujoks, and Michael Emmerich. "SMS-EMOA: Multiobjective selection based on dominated hypervolume." European Journal of Operational Research 181.3 (2007): 1653-1669.

Skills Required: Good knowledge in C++, Python

Skills that help: Thinking in more than 1 dimension, understanding of high dimensional geometries, the concept of Pareto-dominance and not being too afraid of recursion

Mentors: Marcus Märtens and Luís F. Simões

## Constraints Handling Techniques

### Constraints Handling Techniques in Evolutionary Algorithms - difficulty 5

Evolutionary Optimization techniques have been successfully applied in a wide range of real-life optimization problems and a variety of them have been already integrated into the PaGMO/PyGMO collection of heuristic optimization methods. They are though still limited to solve unconstrained optimization problems. Several constraint handling approaches for evolutionary algorithms have been developed in the recent years and divided in five main classes (survey of techniques in [1]): Penalty Methods, Special Representations and Operators, Repair Algorithms, Separation of Objective and Constraints and Hybrid Methods. Among them few approaches have been identified as promising techniques to be integrated into the existing PaGMO/PyGMO infrastructure. In particular:

• Four Penalty Methods: Death penalty (infeasible individual are rejected in the selection procedure regardless their level of infeasibility, [2]); Self Adaptive Fitness Formulation (the fitness of the infeasible individual is increased as to favor those solutions which are nearly feasible and also have a good objective function value, [3]); Stochastic Ranking (balancing the influence of objective function and penalty function in assigning the individual fitness during the ranking process of the population, [4]) and Co-evolutionary Penalty (collaborative evolution of two populations, the second one encoding the penalty factors to be used in the fitness evaluation of the individuals of the first population, [5]).
• Separation of Objective and Constraints: Multi-objective Optimization (the constrained optimization problem is transformed into a multiobjective optimization problem where each constraint violation represents an additional objective [6]).
• Hybrid Methods: Immune System (part of the feasible population is transformed into antigen population, the infeasible solutions generate antibodies those evolving, with particular immune operators, until they become sufficiently similar to the antigen population and then recombined to continue with the evolution [7]); Local Repair Techniques (the infeasible solutions are locally repaired with gradient based unconstrained optimization techniques minimizing the sum of the constraints violations [8]).

The goal of this project is to enhance the capabilities of the PaGMO/PyGMO heuristic optimization solvers to effectively solve constrained optimization problems. The task of the student consists of:

• Implement the selected constraints handling techniques listed above into the PaGMO/PyGMO framework.
• Implement the 24 constrained benchmark problems presented in [9].
• Test the Evolutionary Algorithms of PaGMO/PyGMO (Differential Evolution, Particle Swarm Optimization, Genetic Algorithm, Artificial Bee Colony and Covariance Matrix Adaptation-ES) against all the constraints handling techniques.

Implementation Details: Each approach has a different implementation strategy:

• Death Penalty and Multi-objective Optimization redefine statically the optimization problem (i.e. the new optimization problem is the same during all the evolutions), two meta-problems need to be created to cope with these techniques. The meta-problem takes the original constrained problem as input and generate a new unconstrained problem (respectively single and multi-objective).
• Self Adaptive and Stochastic Ranking redefine dynamically the optimization problem (information are extrapolated from the current population to assign the new fitness function), a meta-algorithm that internally invokes the evolution of the population on a modified problem need to be implemented. The meta-algorithm is constructed given an heuristic algorithm and the constrained problem.
• Co-Evolutionary Penalty and Immune System are meta algorithms as well. They are constructed as before with an heuristic algorithm and a constrained problem. Internally the method performs a preprocessing on the population and defines a modified problem suitable for the evolution.
• Local Repair Technique is a meta-algorithm that takes as argument two algorithms (an heuristic and a local one), it instantiates a new population (the infeasible solutions) with an unconstrained problem defined by the sum of the violation of the constraints and invokes the evolve method of the local technique to solve it.

All code needs to be well written and commented. Documentation needs to be modified according to the changes done in the existing modules.

References

[1] Coello Coello C.A. Theoretical and Numerical Constraints-Handling Techniques Used with Evolutionary Algorithms: A Survey of the State of the Art, Comput. Methods Appl. Mech Engrg. 191, pp. 1245-1287, 2000.

[2] Bäck, T., Hoffmeister, F. and Schwell, H.P. A Survey of evolution strategies, Proceedings of the Fourth International Conference on Genetic Algorithms, Morgan Kaufmann, 2-9, 1991.

[3] Farmani, R. and Wright, J. Self-adaptive fitness formulation for constrained optimization, IEEE Transactions on Evolutionary Computation, 7 (5), pp. 445- 455, 2003.

[4] Runarsson T.P. and Xin Yao, Stochastic Ranking for Constrained Evolutionary Optimization, IEEE Transactions on Evolutionary Computation, Vol. 4, No. 3, pp. 284-294, 2000.

[5] Coello Coello C.A. Use of Self-Adaptive Penalty Approach for Engineering Optimization Problems, Comput. Ind. 41(2), pp. 113-127, 2000.

[6] Coello Coello C.A. Treating Constraints as Objectives for Single-Objective Evolutionary Optimization, Engrg. Optim., 32(3), pp. 275-308, 2000.

[7] Hajela P. and Lee J. Constrained Genetic Search via Schema Adaptation. An Immune Network Solution, Proceedings of the First World Congress of Structural and Multidisciplinary Optimization, pp. 915-920, 1995.

[8] Belur S.V. CORE: Constrained Optimization by Random Evolution, Late Breaking Papers at the Genetic Programming Conference, pp. 280-286, 1997.

[9] J. J. Liang, T. P. Runarsson, E. Mezura-Montes, M. Clerc, P. N. Suganthan, C. A. Coello Coello, K. Deb, Problem Definitions and Evaluation Criteria for the CEC 2006, Special Session on Constrained Real-Parameter Optimization, Technical Report, Nanyang Technological University, Singapore, 2006.

Skills Required: Good knowledge in C++ and Python

Skills that help: Being not afraid of going beyond limits

Mentors: Annalisa Riccardi and Marcus Märtens