Download Latest Version negative_examples.zip (30.0 MB)
Email in envelope

Get an email when there's a new version of sourcesinc

Home / evoms / 0.2
Name Modified Size InfoDownloads / Week
Parent folder
README.txt 2015-04-14 10.1 kB
EvoMS-0.2.4-Toolbox.zip 2015-04-14 427.9 kB
Totals: 2 Items   438.0 kB 0
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
--------     CONTENTS     --------
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

0- REQUIREMENTS
1- INSTALLATION AND DIRECTORIES
2- REACTIONS
3- COMPOUNDS
4- SETTINGS
5- OUTPUTS
6- EXAMPLES


=================================
0- REQUIREMENTS
=================================
- Matlab R2009 or later.

Optional:
- Mozilla Firefox® (www.mozilla.com) Version 26 or later, or Google Chrome®
  (https://www.google.com/intl/en/chrome/browser) Version 30 or later
  for interactive visualization.
- Parallel Computing Toolbox, for multi-core processors or clusters
  (for parEvoMS_demo and parEvoMS).



=================================
1- INSTALLATION AND DIRECTORIES
=================================
Installation only requires to unzip the "EvoMS.zip" file. This tool can be
started by typing in the command line:

- EvoMS_demo    ---> search examples with a single core
- EvoMS         ---> main function to start the search with a single core.
- parEvoMS_demo ---> search examples in parallel mode
- parEvoMS      ---> main function to start the search in parallel mode.


Directories:
  config:            configuration files.
  db:                reactions files.
  lib:               external libraries to process data files.
  out:               pathways found by EvoMS.
  out/measures:      measures evaluated along the evolution.
  src:               source code of EvoMS.

To perform a search you only need to provide the following three
files: 1) REACTIONS.txt, containing the reactions to be taken into
account in the search; 2) COMPOUNDS.yaml, the compounds to relate;
and 3) SETTINGS.yaml, with the settings for the experiment.
These files are described below.
  

====================================
2- REACTIONS
====================================
Reactions dataset must be provided in a plain text file by specifying the code
and chemical equation for each reaction according to standard KEGG notation.
Every reversible reaction must be splitted into two semireactions with opposite
direction. Information of reversibility can be extracted from the KGML files
(XML files) of KEGG.

Example:

-- KEGG --
(http://www.genome.jp/dbget-bin/www_bget?rn:R00097)
R00097: C00004 + C00992 <---> C00080 + C00003 + C00001 + C00541 

should be splitted as: 

-- direct reaction --
R00097: C00004 + C00992 ---> C00080 + C00003 + C00001 + C00541

and

-- reverse reaction --
R00097: C00080 + C00003 + C00001 + C00541 ---> C00004 + C00992


It is possible to create several files specifying different subsets of reactions
(e.g. for different organisms) and indicate to EvoMS which one must be used in the
"config/SETTINGS.yaml" file.



====================================
3- COMPOUNDS
====================================
Compounds used in the search are stored in the "config/COMPOUNDS.yaml" file.
This file defines two sets:

abundant ---> freely available compounds (e.g. H2O, ATP, NAD).
relate   ---> compounds to relate with a metabolic pathway.

A list of abundant compounds is provided in the "COMPOUNDS.yaml" file.
Compounds to relate must be specified by adding at the end of the file the following
structure for each new compound:

---> -
--->   compound: <compound code> (e.g. C00118)
--->   initial: <yes/no> (if it must be used as initial compound in
                          the search)

For example, given three compounds C00001, C00002, C00003 and the following
configuration:

-
  compound: C00001
  initial: yes
-
  compound: C00002
  initial: no
-
  compound: C00003
  initial: yes

C00001 and C00003 are indicated as potential initial compounds.
It is possible to manage several compounds files and choose one for a 
particular search by indicating it in "SETTINGS.yaml".



====================================
4- SETTINGS
====================================
The base EvoMS configuration is stored in the "config/SETTINGS.yaml" file.

This file specifies:

 --- main settings --- 
 name           ---> file name to save the experiment.
 reactions      ---> name of the file containing the set of reactions
                     used in the search.
 compounds      ---> name of the file with the set of available compounds
		         and the set of compounds to relate.
 Ncores         ---> number of cores to perform the search (only used in
                     the parallel version).

--- parameters of the evolutionary algorithm ---
 M          ---> population size.
 NM         ---> maximum number of reactions in a metabolic pathway.
 px         ---> crossover probability.
 pm         ---> mutation probability.
 pE         ---> erasure probability.
 pV         ---> insertion probability.
 Gmax       ---> maximum number of generations.
 K          ---> number of individuals on each selection tournament.

A typical configuration for all these parameters is provided with the software.



=================================
5- OUTPUTS
=================================

---------------------------------
5.1- MEASURES
---------------------------------
Searching is guided by a fitness function based on four terms normalized
in the range [0,1]:

- Validity: determines the proportion of reactions (genes) in the pathway
  (chromosome) that have all the required substrates to be performed.

- Related compounds: calculates the fraction of compounds to relate that
  are already included in the pathway (chromosome).

- Rate of useful products: calculates the fraction of reactions in the pathway
  that produce at least one new compound, that has not been already included in the pathway.
  It indicates which reactions are important and not redundant.

- Connectivity: evaluates the proportion of elements to relate for which
  there exists a sequence of reactions that links them to the initial compound.

Evolution of the search can be assessed through a set of six plots shown by EvoMS.
The following outline indicates the structure of the plots shown:

+---+---+---+
| 1 | 2 | 3 |
+---+---+---+
| 4 | 5 | 6 |
+---+---+---+

1- Evolution of the pathways size:

- red  ---> number of reactions in the best solution.
- blue ---> average number of reactions in the population.


2- Histogram of the pathways sizes on the current generation.


3- Evolution of the terms of the fitness function for the best individual:

- red     ---> Validity
- blue    ---> Related compounds
- green   ---> Rate of useful products
- magenta ---> Connectivity


4- Evolution of the average and maximum fitness, proportion
   of available reactions used, and proportion of pathways
   initialized with each compound to relate:

- blue   ---> Fitness of the best individual.
- red    ---> Average fitness of the population.
- green  ---> Proportion of the reactions database used by the 
              population.
- others ---> Proportion of solutions starting with this compound. 
              There are as many as initial compounds.


5- Individuals represented as function of pathway size and fitness.


6- Evolution of the average values of terms in the fitness function for
   the population:

- red     ---> Validity
- blue    ---> Related compounds
- green   ---> Rate of useful products
- magenta ---> Connectivity


---------------------------------
5.2- PATHWAYS
---------------------------------
Visualization is based on d3 JavaScript library (www.d3js.org). Metabolic
pathways found can be visualized by opening the web page
"out/<name of the experiment>_pathway.html" with Firefox or Chrome.

Metabolic pathway are shown with the follow set of lines and colors:
  red            ---> initial substrate
  yellow         ---> compounds to be produced from initial substrate
  blue           ---> reactions
  light blue     ---> compounds produced in the pathway
  violet         ---> external compounds
  green          ---> abundant compounds
--
  solid line     ---> substrate of the reaction
  dashed line    ---> product of the reaction

The external compounds correspond to substrates used by some reaction 
in the search space and not synthesized by any other reaction.
Consequently, EvoMS automatically determines them by analyzing the
products and substrates of the set of reactions.

The web representation can be manipulated interactively to rearrange the elements
of the metabolic pathway using the mouse (drag and drop to lock and
double-click to unlock a node).


=================================
6- EXAMPLES
=================================
EvoMS provides four predefined examples, each one more complex than the previous
one. To explore the examples run "EvoMS_demo" (single core) or "parEvoMS_demo"
(more than one core) in the command line. Each example launches a visual
interface with a brief description, the database size and a estimation of
the searching time (seconds by generation).

Example 1: Searching of a metabolic pathway linking alpha-D-Glucose-1P
          (C00103) and Glycerate-2P (C00631), specifying alpha-D-Glucose-1P
          (C00103) as the initial compound. The search space consists
          of 77 semireactions.

Example 2: Searching of a metabolic pathway linking L-Glutamate (C00025),
          Fumarate (C00122) and D-Proline (C00763), specifying L-Glutamate
          (C00025) as the initial compound. The search space consists
          of 154 semireactions.
 
Example 3: Searching of a metabolic pathway linking alpha-D-Glucose (C00267)
           and D-Xylosa (C00181), without specifying the initial compound.
           The search space consists of 589 semireactions.
 
Example 4: Searching of a metabolic pathway linking alpha-D-Glucose (C00267),
           D-Xylosa (C00181), Glyceraldehyde 3-phosphate (C00118) and Oxaloacetate
           (C00036), without specifying the initial compound. The search
           space consists of 589 semireactions.

One compounds file is provided for each example. Note that each search will be
performed taking into account the parameters specified in "SETTINGS.yaml".
Three additional files containing 77 semireactions (REACTIONS_77.txt),
154 semireactions (REACTIONS_154.txt) and 589 semireactions (REACTIONS_589.txt)
are also provided in the "db" directory.
Source: README.txt, updated 2015-04-14