Download Latest Version negative_examples.zip (30.0 MB)
Email in envelope

Get an email when there's a new version of sourcesinc

Home / evoms / webdata / Example1r
Name Modified Size InfoDownloads / Week
Parent folder
SETTINGS_Example1r.yaml 2015-01-15 525 Bytes
REACTIONS_77.txt 2015-01-15 3.5 kB
COMPOUNDS_Example1r.yaml 2015-01-15 297 Bytes
Totals: 3 Items   4.3 kB 0
-------------------------------------------------
EvoMS web demo
http://fich.unl.edu.ar/sinc/blog/web-demo/evoms/
-------------------------------------------------


--------    CONTENTS      --------

0- DESCRIPTION AND USAGE
1- REACTIONS
2- COMPOUNDS
3- SETTINGS
4- OUTPUTS
5- EXAMPLE


====================================
0- DESCRIPTION AND USAGE
====================================

The Evolutionary Metabolic Synthesizer (EvoMS) is an evolutionary
tool capable of finding novel metabolic pathways linking several
compounds through feasible reactions. It allows system biologists
to explore different alternatives for relating specific metabolites,
offering the possibility of indicating the initial compound or
allowing the algorithm to automatically select it. Metabolic pathways
found are displayed in the web browser as directed graphs. In all
cases, solutions are networks of reactions that produce linear or
branched metabolic pathways which are feasible from the specified 
set of available compounds.

To perform a search you only need to provide the following three
files: 1) REACTIONS.txt, containing the reactions to be taken into
account in the search; 2)COMPOUNDS.yaml, the compounds to relate;
and 3) SETTINGS.yaml, with the settings for the experiment.
These files are described below.



====================================
1- REACTIONS
====================================
Reactions must be provided in a plain text file by specifying their codes
and chemical equations according to standard KEGG notation. Every
reversible reaction must be splitted into two semireactions with opposite
sense. Information on reversibility can be extracted from the KGML files
(XML files) of KEGG.

Example:

-- KEGG --
(http://www.genome.jp/dbget-bin/www_bget?rn:R00097)
R00097: C00004 + C00992 <---> C00080 + C00003 + C00001 + C00541 

should be splitted as: 

-- direct reaction --
R00097: C00004 + C00992 ---> C00080 + C00003 + C00001 + C00541

and

-- reverse reaction --
R00097: C00080 + C00003 + C00001 + C00541 ---> C00004 + C00992

Two files, containing 77 and 154 semireactions, are provided as
examples (see section 5).


====================================
2- COMPOUNDS
====================================
Compounds used in the search must be specified in this text file.
It defines two sets and is structured as follows:

abundant ---> freely available compounds (e.g. H2O, ATP, NAD).
relate   ---> compounds to relate with a metabolic pathway.

Compounds to relate must be specified by adding at the end file the following
structure for each new compound:

---> -
--->   compound: <compound code> (e.g. C00118)
--->   initial: <yes/no> (if it must be used as initial compound in
                          the search)

For example, given three compounds C00001, C00002, C00003 and the following
configuration:

-
  compound: C00001
  initial: yes
-
  compound: C00002
  initial: no
-
  compound: C00003
  initial: yes

C00001 and C00003 can be used as initial compounds.

An example file for searching a metabolic pathway between two compounds
is provided (see section 5). An example list of abundant compounds and 
cofactors is also provided with this file.



====================================
3- SETTINGS
====================================
This file specifies:

--- parameters of the evolutionary algorithm ---
 M          ---> population size.
 NM         ---> maximum number of reactions in a metabolic pathway.
 px         ---> crossover probability.
 pm         ---> mutation probability.
 pE         ---> erasure probability.
 pV         ---> insertion probability.
 Gmax       ---> maximum number of generations.
 K          ---> number of individuals on each selection tournament.

A typical configuration for all these parameters can be
downloaded as example.



=================================
4- OUTPUTS
=================================
Several outputs are provided when the search ends. They include:

- A TXT file with a list of chemical equations representing
  the best pathway found.

- An HTML file with an interactive representation of
  the best pathway found (see section 4.1).

- An on-screen interactive representation of the best 
  pathway found (see section 4.1).

- A TXT file with all pathways in the last generation,
  sorted according to their fitness. They are provided as
  lists of chemical equations.

- A YAML file containing all measurements made during
  the search (see section 4.2).

- A PDF file with plots of all measurements made during
  the search (see section 4.2).
  
- An image with plots of all measurements made
  during the search (see section 4.2).
  

---------------------------------
4.1- PATHWAY
---------------------------------
Pathway visualization is based on d3 JavaScript library (www.d3js.org).
Metabolic pathways found can be visualized by opening in the same page
of results and by downloading "out/<name of the experiment>_pathway.html"
and open it with Firefox or Chrome.

Metabolic pathways are shown with the following set of lines and colors:
  red            ---> initial substrate
  yellow         ---> compounds to be produced from the initial substrate
  blue           ---> reactions
  light blue     ---> compounds produced in the pathway
  violet         ---> external compounds
  green          ---> abundant compounds
--
  solid lines    ---> substrate of the reaction
  dashed lines   ---> product of the reaction

The external compounds correspond to substrates used by some reaction 
in the search space and not synthesized by any other reaction.
Consequently, EvoMS automatically determines them by analyzing the
products and substrates of the set of reactions.
  
The web representation can be manipulated interactively to rearrange the elements
of the metabolic pathway using the mouse (drag and drop to lock and
double-click to unlock a node).


---------------------------------
4.2- MEASURES
---------------------------------
Searching is guided by a fitness function based on four terms normalized
in the range [0,1]:

- Validity: determines the proportion of reactions (genes) in the pathway
  (chromosome) that have all the required substrates to be performed.

- Related compounds: calculates the fraction of compounds to relate that
  are already included in the pathway (chromosome).

- Rate of useful products: calculates the fraction of reactions in the pathway
  that produce at least one new compound, that has not been already included in it.
  It indicates which reactions are important and not redundant.

- Connectivity: evaluates the proportion of elements to relate for which
  there exists a sequence of reactions that links them to the initial compound.

Evolution can be assessed through a set of six plots shown by EvoMS at the end of the search.
The following outline indicate the structure of the plots shown:

+---+---+---+
| 1 | 2 | 3 |
+---+---+---+
| 4 | 5 | 6 |
+---+---+---+

1- Evolution of the pathways size:

- red  ---> number of reactions in the best solution.
- blue ---> average number of reactions in the population.


2- Histogram of pathway sizes on the current generation (mainly
useful on the desktop version of EvoMS).


3- Evolution of the terms of the fitness function for the best individual:

- red     ---> Validity
- blue    ---> Related compounds
- green   ---> Rate of useful products
- magenta ---> Connectivity


4- Evolution of the average and maximum fitness, proportion
   of available reactions used, and proportion of pathways
   initialized with each compound to relate:

- blue   ---> Fitness of the best individual.
- red    ---> Average fitness of the population.
- green  ---> Proportion of the reactions database used by the 
              population.
- others ---> Proportion of solutions starting with this compound. 
              There are as many as initial compounds.


5- Individuals represented in terms of pathway size and fitness.


6- Evolution of the average values of terms in the fitness function for
   the population:

- red     ---> Validity
- blue    ---> Related compounds
- green   ---> Rate of useful products
- magenta ---> Connectivity



=================================
5- EXAMPLES
=================================
The first example provided (COMPOUNDS_Example1.yaml, 
SETTINGS_Example1.yaml,REACTIONS_77.txt) corresponds to the search of
a metabolic pathway linking alpha-D-Glucose-1P (C00103) and
Glycerate-2P (C00631), where alpha-D-Glucose-1P (C00103) is
specified as the initial substrate. The search space consists
of 77 semireactions.

The second example provided (COMPOUNDS_Example2.yaml, SETTINGS_Example2.yaml,REACTIONS_154.txt) corresponds to the search of
a metabolic pathway linking L-Glutamate (C00025), Fumarate (C00122)
and D-Proline (C00763), without specifying the initial compound.
The search space has 154 semireactions.
Source: README_webdemo.txt, updated 2015-01-15