From: <and...@us...> - 2011-10-26 12:01:28
|
Revision: 15626 http://cdk.svn.sourceforge.net/cdk/?rev=15626&view=rev Author: andreas1981 Date: 2011-10-26 12:01:21 +0000 (Wed, 26 Oct 2011) Log Message: ----------- Modified Paths: -------------- cdk-taverna-2-paper/bmc_article.tex Modified: cdk-taverna-2-paper/bmc_article.tex =================================================================== --- cdk-taverna-2-paper/bmc_article.tex 2011-10-26 06:58:15 UTC (rev 15625) +++ cdk-taverna-2-paper/bmc_article.tex 2011-10-26 12:01:21 UTC (rev 15626) @@ -308,12 +308,12 @@ %% Results and Discussion %% %% \section*{Results and Discussion} -The CDK-Taverna 2.0 plug-in provides 192 workers for input and output (I/O) of various chemical file and line notation formats, substructure filtering, aromaticity detection, atom typing, reaction enumeration, molecular descriptor calculation and data analysis. Parallel computing with multi-core processors by use of multiple concurrent threads is flexibly implemented for many workers where operations scale nearly linear with the number of cores. Especially the machine learning and the molecular descriptor calculation workers benefit from parallel computation. An overview is given in Table 1 and 2. Many workers are described by example workflows available at \url{http://cdk-taverna-2.ts-concepts.de/wiki/index.php?title=Main_Page}. +The CDK-Taverna 2.0 plug-in provides 192 workers for input and output (I/O) of various chemical file and line notation formats, substructure filtering, aromaticity detection, atom typing, reaction enumeration, molecular descriptor calculation and data analysis. Parallel computing with multi-core processors by use of multiple concurrent threads is flexibly implemented for many workers where operations scale nearly linear with the number of cores. Especially the machine learning and the molecular descriptor calculation workers benefit from parallel computation. An overview is given in Table 1 and 2. Many workers are described by example workflows available at \url{http://cdk-taverna-2.ts-concepts.de/wiki/index.php?title=Main_Page}. Additionally, the workflows can be found at\url{http://www.myexperiment.org/}. CDK-Taverna 1.0 was confined to 32 bit Java virtual machine and thus was restricted to in-memory processing of data volumes of at most 2 gigabyte in practice. Version 2.0 also supports 64-bit computing by use of a 64-bit Java virtual machine so that the processable data volume is only limited by hardware constraints (memory, speed): 64-bit in-memory workflows were successfully performed with data sets of about 1 million small molecules. Since the memory restrictions of version 1.0 were a main reason to use Pgchem::tigress as a molecular database backend \cite{Kuhn} the corresponding version 1.0 workers were not migrated to the current version 2.0 yet. \subsection*{Advanced reaction enumeration} -CDK-Taverna 1.0 provided basic functions for combinatorial chemistry related reaction enumeration: They supported the use of two reactants, a single product and one generic group per reactant. The new enumeration options offer major enhancements like multi-match detection, any number of reactants, products or generic groups as well as variable R-groups, ring sizes and atom definitions. The extended functionality was developed and applied in industrial cooperation projects. +CDK-Taverna 1.0 provided basic functions for combinatorial chemistry related reaction enumeration: They supported the use of two reactants, a single product and one generic group per reactant. The new enumeration options used by CDK-Taverna 2.0 offer major enhancements like multi-match detection, any number of reactants, products or generic groups as well as variable R-groups, ring sizes and atom definitions. The extended functionality was developed and applied in industrial cooperation projects. Advanced reaction enumeration features are illustrated in Figure \ref{fig:ReactionEnumerationFeatures}. The \emph{Variable RGroup} feature allows the definition of chemical groups which can be flexibly attached to predefined atoms with syntax \emph{[A:B,B,B...-RC]} where \emph{A} is a freely selectable identifier, \emph{B} are numbers from an \emph{Atom-to-Atom-Mapping} defining the atoms to which the generic group can be attached and \emph{C} is the chemical group identifier which can be any number. The \emph{Atom Alias} feature offers the possibility to define a wild card for preconfigured elements. The syntax is \emph{[A:B,B,B...]} where \emph{A} is a freely selectable identifier and \emph{B} are the string representations of the possible elements. The \emph{Expandable Atom} feature enables the definition of freely sizeable rings or aliphatic chains with syntax \emph{[A:[]B]} where \emph{A} is a freely selectable identifier and \emph{B} is the maximum number of atoms to insert. Figure \ref{fig:ReactionEnumerationWorkflow} depicts a workflow for reaction enumeration. The capabilities of the advanced reaction enumerator implementation are summarized in Figure \ref{fig:ReactionEnumerationResults} which also demonstrates multi-match detection, i.e. multiple reaction centers within one molecule. This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |