[Cdk-commits] SF.net SVN: cdk:[15472] cdk-taverna-paper/trunk/cdk-taverna/bmc_article.tex

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Revision: 15472
          http://cdk.svn.sourceforge.net/cdk/?rev=15472&view=rev
Author:   egonw
Date:     2010-03-10 20:38:13 +0000 (Wed, 10 Mar 2010)

Log Message:
-----------
Applied more spelling fixes by Achim

Modified Paths:
--------------
    cdk-taverna-paper/trunk/cdk-taverna/bmc_article.tex

Modified: cdk-taverna-paper/trunk/cdk-taverna/bmc_article.tex
===================================================================

--- cdk-taverna-paper/trunk/cdk-taverna/bmc_article.tex	2010-03-10 20:10:51 UTC (rev 15471)
+++ cdk-taverna-paper/trunk/cdk-taverna/bmc_article.tex	2010-03-10 20:38:13 UTC (rev 15472)
@@ -347,7 +347,7 @@
 The molecules containing the substructure are reported in tabular form in a PDF file.
 
 \subsection*{Scenario 2: Descriptor Calculation}
-The Descriptor Calculation Workflow workflow, depicted in
+The descriptor calculation workflow, depicted in
 Figure~\ref{fig:QSARWorkflow}, starts with loading its molecules from a
 PostgreSQL database. The recognition of the atom types is the next step,
 followed by the addition of implicit hydrogens for each molecule as well as the
@@ -367,13 +367,14 @@
 calculate a large number of descriptors for many thousand of molecules in a
 reasonable time (see Figure~\ref{fig:TimeCalculateDescriptors}).
 
-\subsection*{Scenario 3: Iterative QSAR}
+\subsection*{Scenario 3: Iterative Descriptor Calculation}
 
-The iterative QSAR workflow is a \emph{work-around} which allows the treatment of 
-hundreds of thousands of molecules. 
+The iterative descriptor calculation workflow is a \emph{work-around} which
+allows the treatment of hundreds of thousands of molecules. 
 This workflow (see
 Figure~\ref{fig:IterativeQSARWorkflow}) processes each molecule in the same
-manner as the QSAR Calculation Workflow but it uses different database workers.
+manner as the non-iterative descriptor calculation workflow
+but it uses different database workers.
 Instead of the single database worker \texttt{Get\_Molecules\_From\_Database} three database
 workers are applied: \texttt{Iterative\_Molecule\_From\_Database\_Reader}, 
 \texttt{Get\_Molecule\_From\_Database} and \texttt{Has\_Next\_Molecule\_From\_Database}.
@@ -381,7 +382,7 @@
 connection and store it within an internal object registry. The second worker 
 gets the ID of the database connection as an input and loads molecules from the 
 database. Only a subset of the original query is loaded using the SQL functions LIMIT
-and OFFSET. The last database worker checks whether the set of loaded molecules the
+and OFFSET. The last database worker checks whether the set of loaded molecules is
 the last of this query or if further molecules must be loaded. If the latter applies 
 the output of this last worker would be the text value \texttt{true}. A last but
 essential worker is \texttt{Fail\_if\_true}. This worker throws an exception if it gets 
@@ -400,7 +401,7 @@
 for an atom, the atom is flagged as \emph{unknown}. Based on the CDK's atom type
 perception functionality, we devised an example workflow (see
 Figure~\ref{fig:AtomTypingWF}) for the validation of the CDK atom typing
-procedures vs processed data. The detection of an unknown atom type by the CDK
+procedures. The detection of an unknown atom type by the CDK
 indicates that either the CDK lacks this specific atom type or the molecule
 contains chemically nonsensical atom types. In Figure~\ref{fig:AtomTypingWF} the
 \texttt{Perceive\_atom\_types} worker performs an atom type detection, followed
@@ -417,14 +418,14 @@
 molecules and showed that the CDK algorithms matches the atom types quite well,
 but that the atom type list is not complete for metals and other heavy
 atoms (see Figure~\ref{fig:AtomTypingResults}).
-Incomplete atom type lists is not unique to the CDK and leads problems with the
-application of cheminformatics algorithms on, for example, coordination
-compounds containing metals, making atom type matching an import filter in
+Missing atom type definitions is a general problem to many cheminformatics algorithms
+and not unique to the CDK: it leads to severe problems and computation
+error. Therefore, initial atom type perception is an important filter for
 cheminformatics workflows.  
 
 \subsection*{Scenario 5: Reaction Enumeration}	
 Markush structures are chemical drawings which represent a series
-of molecules by indicating location where differences occur.
+of molecules by indicating locations where differences occur.
 These locations are marked as \emph{Heterocyclic}, \emph{Alkyl}, or identified
 by an \emph{R} group, enumerating a series of possible groups, such
 as \emph{Methyl}, \emph{Isopropyl}, and \emph{Pentyl}.
@@ -438,11 +439,11 @@
 spaces, which includes the generation of chemical target libraries.
 The results of the enumeration has important applications in
 patent formulation and in High Throughput Screening (HTS). HTS
-experiments screen large amounts of small molecules, called a library,
+experiments screen large amounts of small molecules, called molecule libraries,
 against one or more assays for testing for biological activity. A
 couple of years ago, the libraries used for a single HTS experiment
-consisted of up to 10.000 to 100.000 molecules. Nowadays, more
-targeted libraries of a reduced size are use of up to 1.000 molecules,
+consisted of up to 100.000 molecules. Nowadays, more
+targeted libraries of a reduced size of up to 1.000 molecules are used,
 but still commonly defined using Markush structures.
 
 For reaction enumeration, a given reaction contains different building blocks,


This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site.