-------------------------------------------------
EvoMS web demo
http://fich.unl.edu.ar/sinc/blog/web-demo/evoms/
-------------------------------------------------
-------- CONTENTS --------
0- DESCRIPTION AND USAGE
1- REACTIONS
2- COMPOUNDS
3- SETTINGS
4- OUTPUTS
5- EXAMPLE
====================================
0- DESCRIPTION AND USAGE
====================================
The Evolutionary Metabolic Synthesizer (EvoMS) is an evolutionary
tool capable of finding novel metabolic pathways linking several
compounds through feasible reactions. It allows system biologists
to explore different alternatives for relating specific metabolites,
offering the possibility of indicating the initial compound or
allowing the algorithm to automatically select it. Metabolic pathways
found are displayed in the web browser as directed graphs. In all
cases, solutions are networks of reactions that produce linear or
branched metabolic pathways which are feasible from the specified
set of available compounds.
To perform a search you only need to provide the following three
files: 1) REACTIONS.txt, containing the reactions to be taken into
account in the search; 2)COMPOUNDS.yaml, the compounds to relate;
and 3) SETTINGS.yaml, with the settings for the experiment.
These files are described below.
====================================
1- REACTIONS
====================================
Reactions must be provided in a plain text file by specifying their codes
and chemical equations according to standard KEGG notation. Every
reversible reaction must be splitted into two semireactions with opposite
sense. Information on reversibility can be extracted from the KGML files
(XML files) of KEGG.
Example:
-- KEGG --
(http://www.genome.jp/dbget-bin/www_bget?rn:R00097)
R00097: C00004 + C00992 <---> C00080 + C00003 + C00001 + C00541
should be splitted as:
-- direct reaction --
R00097: C00004 + C00992 ---> C00080 + C00003 + C00001 + C00541
and
-- reverse reaction --
R00097: C00080 + C00003 + C00001 + C00541 ---> C00004 + C00992
Two files, containing 77 and 154 semireactions, are provided as
examples (see section 5).
====================================
2- COMPOUNDS
====================================
Compounds used in the search must be specified in this text file.
It defines two sets and is structured as follows:
abundant ---> freely available compounds (e.g. H2O, ATP, NAD).
relate ---> compounds to relate with a metabolic pathway.
Compounds to relate must be specified by adding at the end file the following
structure for each new compound:
---> -
---> compound: <compound code> (e.g. C00118)
---> initial: <yes/no> (if it must be used as initial compound in
the search)
For example, given three compounds C00001, C00002, C00003 and the following
configuration:
-
compound: C00001
initial: yes
-
compound: C00002
initial: no
-
compound: C00003
initial: yes
C00001 and C00003 can be used as initial compounds.
An example file for searching a metabolic pathway between two compounds
is provided (see section 5). An example list of abundant compounds and
cofactors is also provided with this file.
====================================
3- SETTINGS
====================================
This file specifies:
--- parameters of the evolutionary algorithm ---
M ---> population size.
NM ---> maximum number of reactions in a metabolic pathway.
px ---> crossover probability.
pm ---> mutation probability.
pE ---> erasure probability.
pV ---> insertion probability.
Gmax ---> maximum number of generations.
K ---> number of individuals on each selection tournament.
A typical configuration for all these parameters can be
downloaded as example.
=================================
4- OUTPUTS
=================================
Several outputs are provided when the search ends. They include:
- A TXT file with a list of chemical equations representing
the best pathway found.
- An HTML file with an interactive representation of
the best pathway found (see section 4.1).
- An on-screen interactive representation of the best
pathway found (see section 4.1).
- A TXT file with all pathways in the last generation,
sorted according to their fitness. They are provided as
lists of chemical equations.
- A YAML file containing all measurements made during
the search (see section 4.2).
- A PDF file with plots of all measurements made during
the search (see section 4.2).
- An image with plots of all measurements made
during the search (see section 4.2).
---------------------------------
4.1- PATHWAY
---------------------------------
Pathway visualization is based on d3 JavaScript library (www.d3js.org).
Metabolic pathways found can be visualized by opening in the same page
of results and by downloading "out/<name of the experiment>_pathway.html"
and open it with Firefox or Chrome.
Metabolic pathways are shown with the following set of lines and colors:
red ---> initial substrate
yellow ---> compounds to be produced from the initial substrate
blue ---> reactions
light blue ---> compounds produced in the pathway
violet ---> external compounds
green ---> abundant compounds
--
solid lines ---> substrate of the reaction
dashed lines ---> product of the reaction
The external compounds correspond to substrates used by some reaction
in the search space and not synthesized by any other reaction.
Consequently, EvoMS automatically determines them by analyzing the
products and substrates of the set of reactions.
The web representation can be manipulated interactively to rearrange the elements
of the metabolic pathway using the mouse (drag and drop to lock and
double-click to unlock a node).
---------------------------------
4.2- MEASURES
---------------------------------
Searching is guided by a fitness function based on four terms normalized
in the range [0,1]:
- Validity: determines the proportion of reactions (genes) in the pathway
(chromosome) that have all the required substrates to be performed.
- Related compounds: calculates the fraction of compounds to relate that
are already included in the pathway (chromosome).
- Rate of useful products: calculates the fraction of reactions in the pathway
that produce at least one new compound, that has not been already included in it.
It indicates which reactions are important and not redundant.
- Connectivity: evaluates the proportion of elements to relate for which
there exists a sequence of reactions that links them to the initial compound.
Evolution can be assessed through a set of six plots shown by EvoMS at the end of the search.
The following outline indicate the structure of the plots shown:
+---+---+---+
| 1 | 2 | 3 |
+---+---+---+
| 4 | 5 | 6 |
+---+---+---+
1- Evolution of the pathways size:
- red ---> number of reactions in the best solution.
- blue ---> average number of reactions in the population.
2- Histogram of pathway sizes on the current generation (mainly
useful on the desktop version of EvoMS).
3- Evolution of the terms of the fitness function for the best individual:
- red ---> Validity
- blue ---> Related compounds
- green ---> Rate of useful products
- magenta ---> Connectivity
4- Evolution of the average and maximum fitness, proportion
of available reactions used, and proportion of pathways
initialized with each compound to relate:
- blue ---> Fitness of the best individual.
- red ---> Average fitness of the population.
- green ---> Proportion of the reactions database used by the
population.
- others ---> Proportion of solutions starting with this compound.
There are as many as initial compounds.
5- Individuals represented in terms of pathway size and fitness.
6- Evolution of the average values of terms in the fitness function for
the population:
- red ---> Validity
- blue ---> Related compounds
- green ---> Rate of useful products
- magenta ---> Connectivity
=================================
5- EXAMPLES
=================================
The first example provided (COMPOUNDS_Example1.yaml,
SETTINGS_Example1.yaml,REACTIONS_77.txt) corresponds to the search of
a metabolic pathway linking alpha-D-Glucose-1P (C00103) and
Glycerate-2P (C00631), where alpha-D-Glucose-1P (C00103) is
specified as the initial substrate. The search space consists
of 77 semireactions.
The second example provided (COMPOUNDS_Example2.yaml, SETTINGS_Example2.yaml,REACTIONS_154.txt) corresponds to the search of
a metabolic pathway linking L-Glutamate (C00025), Fumarate (C00122)
and D-Proline (C00763), without specifying the initial compound.
The search space has 154 semireactions.