From: <eg...@us...> - 2010-03-10 20:38:20
|
Revision: 15472 http://cdk.svn.sourceforge.net/cdk/?rev=15472&view=rev Author: egonw Date: 2010-03-10 20:38:13 +0000 (Wed, 10 Mar 2010) Log Message: ----------- Applied more spelling fixes by Achim Modified Paths: -------------- cdk-taverna-paper/trunk/cdk-taverna/bmc_article.tex Modified: cdk-taverna-paper/trunk/cdk-taverna/bmc_article.tex =================================================================== --- cdk-taverna-paper/trunk/cdk-taverna/bmc_article.tex 2010-03-10 20:10:51 UTC (rev 15471) +++ cdk-taverna-paper/trunk/cdk-taverna/bmc_article.tex 2010-03-10 20:38:13 UTC (rev 15472) @@ -347,7 +347,7 @@ The molecules containing the substructure are reported in tabular form in a PDF file. \subsection*{Scenario 2: Descriptor Calculation} -The Descriptor Calculation Workflow workflow, depicted in +The descriptor calculation workflow, depicted in Figure~\ref{fig:QSARWorkflow}, starts with loading its molecules from a PostgreSQL database. The recognition of the atom types is the next step, followed by the addition of implicit hydrogens for each molecule as well as the @@ -367,13 +367,14 @@ calculate a large number of descriptors for many thousand of molecules in a reasonable time (see Figure~\ref{fig:TimeCalculateDescriptors}). -\subsection*{Scenario 3: Iterative QSAR} +\subsection*{Scenario 3: Iterative Descriptor Calculation} -The iterative QSAR workflow is a \emph{work-around} which allows the treatment of -hundreds of thousands of molecules. +The iterative descriptor calculation workflow is a \emph{work-around} which +allows the treatment of hundreds of thousands of molecules. This workflow (see Figure~\ref{fig:IterativeQSARWorkflow}) processes each molecule in the same -manner as the QSAR Calculation Workflow but it uses different database workers. +manner as the non-iterative descriptor calculation workflow +but it uses different database workers. Instead of the single database worker \texttt{Get\_Molecules\_From\_Database} three database workers are applied: \texttt{Iterative\_Molecule\_From\_Database\_Reader}, \texttt{Get\_Molecule\_From\_Database} and \texttt{Has\_Next\_Molecule\_From\_Database}. @@ -381,7 +382,7 @@ connection and store it within an internal object registry. The second worker gets the ID of the database connection as an input and loads molecules from the database. Only a subset of the original query is loaded using the SQL functions LIMIT -and OFFSET. The last database worker checks whether the set of loaded molecules the +and OFFSET. The last database worker checks whether the set of loaded molecules is the last of this query or if further molecules must be loaded. If the latter applies the output of this last worker would be the text value \texttt{true}. A last but essential worker is \texttt{Fail\_if\_true}. This worker throws an exception if it gets @@ -400,7 +401,7 @@ for an atom, the atom is flagged as \emph{unknown}. Based on the CDK's atom type perception functionality, we devised an example workflow (see Figure~\ref{fig:AtomTypingWF}) for the validation of the CDK atom typing -procedures vs processed data. The detection of an unknown atom type by the CDK +procedures. The detection of an unknown atom type by the CDK indicates that either the CDK lacks this specific atom type or the molecule contains chemically nonsensical atom types. In Figure~\ref{fig:AtomTypingWF} the \texttt{Perceive\_atom\_types} worker performs an atom type detection, followed @@ -417,14 +418,14 @@ molecules and showed that the CDK algorithms matches the atom types quite well, but that the atom type list is not complete for metals and other heavy atoms (see Figure~\ref{fig:AtomTypingResults}). -Incomplete atom type lists is not unique to the CDK and leads problems with the -application of cheminformatics algorithms on, for example, coordination -compounds containing metals, making atom type matching an import filter in +Missing atom type definitions is a general problem to many cheminformatics algorithms +and not unique to the CDK: it leads to severe problems and computation +error. Therefore, initial atom type perception is an important filter for cheminformatics workflows. \subsection*{Scenario 5: Reaction Enumeration} Markush structures are chemical drawings which represent a series -of molecules by indicating location where differences occur. +of molecules by indicating locations where differences occur. These locations are marked as \emph{Heterocyclic}, \emph{Alkyl}, or identified by an \emph{R} group, enumerating a series of possible groups, such as \emph{Methyl}, \emph{Isopropyl}, and \emph{Pentyl}. @@ -438,11 +439,11 @@ spaces, which includes the generation of chemical target libraries. The results of the enumeration has important applications in patent formulation and in High Throughput Screening (HTS). HTS -experiments screen large amounts of small molecules, called a library, +experiments screen large amounts of small molecules, called molecule libraries, against one or more assays for testing for biological activity. A couple of years ago, the libraries used for a single HTS experiment -consisted of up to 10.000 to 100.000 molecules. Nowadays, more -targeted libraries of a reduced size are use of up to 1.000 molecules, +consisted of up to 100.000 molecules. Nowadays, more +targeted libraries of a reduced size of up to 1.000 molecules are used, but still commonly defined using Markush structures. For reaction enumeration, a given reaction contains different building blocks, This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |