Name | Modified | Size | Downloads / Week |
---|---|---|---|
Parent folder | |||
Example1 | 2015-01-15 | ||
Example2 | 2015-01-15 | ||
Example1r | 2015-01-15 | ||
README_webdemo.txt | 2015-01-15 | 8.8 kB | |
Totals: 4 Items | 8.8 kB | 0 |
------------------------------------------------- EvoMS web demo http://fich.unl.edu.ar/sinc/blog/web-demo/evoms/ ------------------------------------------------- -------- CONTENTS -------- 0- DESCRIPTION AND USAGE 1- REACTIONS 2- COMPOUNDS 3- SETTINGS 4- OUTPUTS 5- EXAMPLE ==================================== 0- DESCRIPTION AND USAGE ==================================== The Evolutionary Metabolic Synthesizer (EvoMS) is an evolutionary tool capable of finding novel metabolic pathways linking several compounds through feasible reactions. It allows system biologists to explore different alternatives for relating specific metabolites, offering the possibility of indicating the initial compound or allowing the algorithm to automatically select it. Metabolic pathways found are displayed in the web browser as directed graphs. In all cases, solutions are networks of reactions that produce linear or branched metabolic pathways which are feasible from the specified set of available compounds. To perform a search you only need to provide the following three files: 1) REACTIONS.txt, containing the reactions to be taken into account in the search; 2)COMPOUNDS.yaml, the compounds to relate; and 3) SETTINGS.yaml, with the settings for the experiment. These files are described below. ==================================== 1- REACTIONS ==================================== Reactions must be provided in a plain text file by specifying their codes and chemical equations according to standard KEGG notation. Every reversible reaction must be splitted into two semireactions with opposite sense. Information on reversibility can be extracted from the KGML files (XML files) of KEGG. Example: -- KEGG -- (http://www.genome.jp/dbget-bin/www_bget?rn:R00097) R00097: C00004 + C00992 <---> C00080 + C00003 + C00001 + C00541 should be splitted as: -- direct reaction -- R00097: C00004 + C00992 ---> C00080 + C00003 + C00001 + C00541 and -- reverse reaction -- R00097: C00080 + C00003 + C00001 + C00541 ---> C00004 + C00992 Two files, containing 77 and 154 semireactions, are provided as examples (see section 5). ==================================== 2- COMPOUNDS ==================================== Compounds used in the search must be specified in this text file. It defines two sets and is structured as follows: abundant ---> freely available compounds (e.g. H2O, ATP, NAD). relate ---> compounds to relate with a metabolic pathway. Compounds to relate must be specified by adding at the end file the following structure for each new compound: ---> - ---> compound: <compound code> (e.g. C00118) ---> initial: <yes/no> (if it must be used as initial compound in the search) For example, given three compounds C00001, C00002, C00003 and the following configuration: - compound: C00001 initial: yes - compound: C00002 initial: no - compound: C00003 initial: yes C00001 and C00003 can be used as initial compounds. An example file for searching a metabolic pathway between two compounds is provided (see section 5). An example list of abundant compounds and cofactors is also provided with this file. ==================================== 3- SETTINGS ==================================== This file specifies: --- parameters of the evolutionary algorithm --- M ---> population size. NM ---> maximum number of reactions in a metabolic pathway. px ---> crossover probability. pm ---> mutation probability. pE ---> erasure probability. pV ---> insertion probability. Gmax ---> maximum number of generations. K ---> number of individuals on each selection tournament. A typical configuration for all these parameters can be downloaded as example. ================================= 4- OUTPUTS ================================= Several outputs are provided when the search ends. They include: - A TXT file with a list of chemical equations representing the best pathway found. - An HTML file with an interactive representation of the best pathway found (see section 4.1). - An on-screen interactive representation of the best pathway found (see section 4.1). - A TXT file with all pathways in the last generation, sorted according to their fitness. They are provided as lists of chemical equations. - A YAML file containing all measurements made during the search (see section 4.2). - A PDF file with plots of all measurements made during the search (see section 4.2). - An image with plots of all measurements made during the search (see section 4.2). --------------------------------- 4.1- PATHWAY --------------------------------- Pathway visualization is based on d3 JavaScript library (www.d3js.org). Metabolic pathways found can be visualized by opening in the same page of results and by downloading "out/<name of the experiment>_pathway.html" and open it with Firefox or Chrome. Metabolic pathways are shown with the following set of lines and colors: red ---> initial substrate yellow ---> compounds to be produced from the initial substrate blue ---> reactions light blue ---> compounds produced in the pathway violet ---> external compounds green ---> abundant compounds -- solid lines ---> substrate of the reaction dashed lines ---> product of the reaction The external compounds correspond to substrates used by some reaction in the search space and not synthesized by any other reaction. Consequently, EvoMS automatically determines them by analyzing the products and substrates of the set of reactions. The web representation can be manipulated interactively to rearrange the elements of the metabolic pathway using the mouse (drag and drop to lock and double-click to unlock a node). --------------------------------- 4.2- MEASURES --------------------------------- Searching is guided by a fitness function based on four terms normalized in the range [0,1]: - Validity: determines the proportion of reactions (genes) in the pathway (chromosome) that have all the required substrates to be performed. - Related compounds: calculates the fraction of compounds to relate that are already included in the pathway (chromosome). - Rate of useful products: calculates the fraction of reactions in the pathway that produce at least one new compound, that has not been already included in it. It indicates which reactions are important and not redundant. - Connectivity: evaluates the proportion of elements to relate for which there exists a sequence of reactions that links them to the initial compound. Evolution can be assessed through a set of six plots shown by EvoMS at the end of the search. The following outline indicate the structure of the plots shown: +---+---+---+ | 1 | 2 | 3 | +---+---+---+ | 4 | 5 | 6 | +---+---+---+ 1- Evolution of the pathways size: - red ---> number of reactions in the best solution. - blue ---> average number of reactions in the population. 2- Histogram of pathway sizes on the current generation (mainly useful on the desktop version of EvoMS). 3- Evolution of the terms of the fitness function for the best individual: - red ---> Validity - blue ---> Related compounds - green ---> Rate of useful products - magenta ---> Connectivity 4- Evolution of the average and maximum fitness, proportion of available reactions used, and proportion of pathways initialized with each compound to relate: - blue ---> Fitness of the best individual. - red ---> Average fitness of the population. - green ---> Proportion of the reactions database used by the population. - others ---> Proportion of solutions starting with this compound. There are as many as initial compounds. 5- Individuals represented in terms of pathway size and fitness. 6- Evolution of the average values of terms in the fitness function for the population: - red ---> Validity - blue ---> Related compounds - green ---> Rate of useful products - magenta ---> Connectivity ================================= 5- EXAMPLES ================================= The first example provided (COMPOUNDS_Example1.yaml, SETTINGS_Example1.yaml,REACTIONS_77.txt) corresponds to the search of a metabolic pathway linking alpha-D-Glucose-1P (C00103) and Glycerate-2P (C00631), where alpha-D-Glucose-1P (C00103) is specified as the initial substrate. The search space consists of 77 semireactions. The second example provided (COMPOUNDS_Example2.yaml, SETTINGS_Example2.yaml,REACTIONS_154.txt) corresponds to the search of a metabolic pathway linking L-Glutamate (C00025), Fumarate (C00122) and D-Proline (C00763), without specifying the initial compound. The search space has 154 semireactions.